Pointers to Internet resources
Back to Resources
Bibliographies
Bibliography of constructive induction - feature engineering
Bibliography on Automated Text Categorization
Bibliography - Text Categorization
Automatic Text Processing related short bibliography
Feature Subset Selection Bibliography
Bibliography of NLP in Biomedicine
Lifelong learning, meta-learning
Spam Bibliography
Machine Learning Bibliographies
Machine Learning Applied to Text
Feature Selection
Computer Science Bibliographies
TDT Publications
Bibliography on Transformation-Based Learning
Back to top
Projects
Common Sense
Open Mind
OpenCyc
ThoughtTreasure home page
Cycorp
Companies and organizations
Electronic pocket talking dictionaries and translators
ARDA Home Page
Web Intelligence Consortium
AvaQuest, Inc. Resources - Categorization Vendors
Open source projects
Senga
The OpenNLP Homepage
Worldwide Lexicon
NLP Toolkit
POPFile Automatic Email Sorting using Naive Bayes
linguana
Morphix-NLP -- The most NLP application on one CD!
ZSoft platform-independent solutions for Data Mining
Spam mail
Email spam
A Plan for Spam
POPFile Automatic Email Sorting using Naive Bayes
Spammunition
Internet Content Filtering Group
Machine Learning Laboratory
Snowfox Home
Research proposal
Welcome to Cross Language Evaluation Forum
Text REtrieval Conference (TREC)
WebBase Project
Data Mining on the Web (mentions OpenDir)
WebKB
search.cpan.org Ken Williams - AI-Categorizer
WebKB@CMU
Interspace
Center for Automated Learning and Discovery
Columbia Newsblaster
Google Web APIs - Home
The Lemur Toolkit for Language Modeling and Information Retrieval
UNLP General Information
Text categorization using lexical chains
Kernel Methods for Image and Text
Natural Language Processing (NLP) at Cornell
The CAPTCHA Project
Demo of semantic word orientation
Back to top
Tools
SVM
LIBSVM
MATLAB Support Vector Machine Toolbox
SVM-Light Support Vector Machine
SvmFu Documentation
mySVM
Language Identification
Language Identification Tools
Stochastic Language Identifier
Language Identification
XRCE CA Language Identifier
Welcome to Inxight Software, Inc.
OEM Products Language & Character Encoding Identification
Automatic Language Identification Bibliography
RALI -- S I L C
Identification of Language and Character Encoding
Basis Technology's Products Rosette Language Identifier
TextCat Language Guesser
Stemming
Porter in Perl
Lovins
Snowball
Porter Stemming Algorithm
Part of Speech Tagging
MULTEXT
TnT - Statistical Part-of-Speech Tagging
QTag
Eric Brill's tagger
ePost - C++ wrapper of Brill's tagger
Text categorization
The Bow Toolkit
UDC in brief
Kea - automatic keyphrase extraction
BoosTexter
SNoW
LTG software LT TCR
S-EM download page Learning with Positive and Unlabeled Data
LPU download page
Machine Learning
C4.5 - C5.0
See5 An Informal Tutorial
RuleQuest Research Data Mining Tools
Ross Quinlan - AI Group, CSE
Weka 3
The SLIPPER Rule Learning System
The WHIRL System
DTREG -- Decision Tree Analysis Program
NLREG -- Nonlinear Regression Analysis Program
SGI - MLC++ Home Page
YALE - Yet Another Learning Environment
WordNet
EuroWordNet
The Global WordNet Association
WordNet
WordNet 1.6 Vocabulary Helper
WordNet in RDF
Wordnet Domains
Richard Lexicon Home
Demos
Roget's Thesaurus
Roget's Thesaurus as an Electronic Lexical Knowledge Base
LSA and HAL
LSI - Latent Semantic Indexing Web Site
Psycholinguistics and Computational Cognition Lab
Telcordia Latent Semantic Indexing (LSI) Demo Machine
LSA @ CU Boulder
Introduction to LSI
Hubs
CMU AI Repository - NLP
NL Software Registry @ DFKI
Resources
Software Tools for NLP
Speech and Language Web Resources
The Data Warehousing Information Center - Text Mining Tools
Welcome to Cognitive Computation
Sentence boundary detection
SATZ - Sentence boundary detector
MXTERMINATOR
search.cpan.org Tony G. Rose - HTML-Summary-0.017
LTG software LT TTT
Adwait Ratnaparkhi Stat NLP
Automatic English Sentence Segmenter
LinguaENSentence - Splitting text into sentences.
Sentencizers
XML parsers
expat
Xerces C++ Parser
Open directory
Yahoo
About Yahoo
Open Directory - Use of ODP Data
Web Directory Sizes
ODP and Yahoo Size Projection Charts
Semantic metrics
Dekang Lin - semantic metrics
search.cpan.org Siddharth Patwardhan - WordNet-Similarity-0.03
NL parsing
Minipar
Link Grammar Parser
Apple Pie Parser
Conexor Analyzers
Misc text analysis tools
LT Group - Edinburgh
Infogistics Text Analysis tools
Senga
fnTBL Toolkit - Home
WordStat
SRI Language Modeling Toolkit
Textomy - tooks for text dissection
Text summarization
Copernic Summarizer - Product Overview
search.cpan.org HTMLSummary - module for generating a summary from a web page.
HTML parsers
Clean up your Web pages with HTML TIDY
HTML Tidy Project Page
Named Entity Recognition
Language-Independent Named Entity Recognition
AI Search
Local++ Project Home Page
AI C++ Search Class Library
Math
Netlib
TNT Home Page
GAMS - Guide to Available Mathematical Software
Critical t Values
Peter Hellekalek pLab Software
Pseudo random number generators
C++
STL Guide at SGI
STLport
Boost
STL Error Decryptor
Scripting
Rob van der Woude's Scripting Pages Batch Files
Sample Win9x Batch Programs
GSview
Introduction to GnuPlot
Back to top
Misc
Search engines
Notess.com_ The Greg Notess Web Site
Search Engine Watch
Search Tools - Information, Guides and News
Finding Information on the Internet A TUTORIAL
Search tools
Web Search @ About.com
The Internet Archive Wayback Machine
Searchengines.Ru
Search Engine Showdown
Teoma Search -- Search with Authority
KartOO
On Search, the Series
Speech Processing
Speech Recognition Update
Speech Technology Magazine online
Speechtechnology Network
Compaq.com - SpeechBot
Biometric Consortium
Book publishers
MIT Press
Addison-Wesley
Prentice Hall
W.H.Freeman and Company
Cambridge University Press
Academic Press
Kluwer Academic Publishers
Oxford University Press
The University of Chicago Press
Elsevier
John Wiley and Sons
O'Reilly and Associates
McGraw-Hill Book Company
Mcmillan Computer Reference
Mailing lists
TREC filtering
Corpora
Colibri
Elsnet list
Linguist
Search Engine Report
Connectionists
WebIR
Back to top
Corpora and lexicons
Hubs
SIGLEX Resources
Corpus Linguistics
English language corpora
Linguistic Data Resources on the Internet
The ACL NLP-CL Universe
W3-Corpora List of Corpora
BNC English Language Corpora and Corpus resources
David Lee's Bookmarks for Corpus-based Linguists
Online books and texts
Project Gutenberg
Electronic Text Center -- University of Virginia
The Online Books Page
RCV1
Reuters Research and Standards Group - Corpus
RCV1
Reuters-21578
Reuters-21578 Text Categorization Collection
Reuters-21578 Text Categorization Test Collection
Tools for Reuters-21578 Text Categorization Dataset
OHSUMED
Files Available to Download or View
Medical Subject Headings (MeSH)
OHSUMED (FTP)
American National Corpus
Novelty and Redundancy Detection for Adaptive Filtering DataSet
Glasgow IDOM - Test collections
ICAME
The BNC Handbook
LDC - Linguistic Data Consortium
The ELRA home page
The Oxford Text Archive
WIPO automated categorization datasets
Web Term Document Frequency Form
OPUS - an open source parallel corpus
Collocational Dictionary (ARCS)
The Moby Project
The TREC-AP Text Categorization Test Collection
Words and Phrases from the British National Corpus
Free Association Norms
Longman Dictionaries for Research (LDOCE)
Movie Review Data
Back to top
Scientific search
NCSTRL Home Page
Computer Science e-Print Archive
Cora Research Paper Search
IEEE Xplore
ResearchIndex (NEC)
Welcome to the ACM Digital Library
Welcome to IEEE Transactions & Journals
Scirus - Searching for Science
Unified Computer Science TR Index (UCSTRI)
search4science
Computation and Language - ISRAEL Mirror
Other Lists of Bibliographies
Computer Science Bibliography Glimpse Server
Cornell Computer Science Technical Reports
NASA Technical Report Server (NTRS)
Papers database main page
Technical Reports - NASA LaRC Technical Library
Back to top
Online publications
Journals
Journal of Artificial Intelligence Research
Journal of Machine Learning Research
Journal of Intelligent Information Systems
TAL journal - Association pour le Traitement Automatique des LAngues
Conferences
VLDB Endowment Inc.
Books and reports
Foundations of Statistical Natural Language Processing
Survey of the State of the Art in Human Language Technology
Pattern Classification - Duda, Hart, Stork
Generalized Information Measures and their Applications
Managing Gigabytes
Numerical Recipes
Data-Intensive Linguistics
ACL Anthology
Back to top
Hubs on NLP, IR, ML etc
ELSnet Homepage
fabulousness - linguistics and stuff
Information Retrieval Links
Fieldmethods.net
Linguistic Resources on the Internet
Speech and Language Web Resources
Boosting Research Site Boosting.org
Survey of Information Retrieval
The Association for Computational Linguistics
The LINGUIST List
COLT Computational Learning Theory
Pattern Recognition on the Web
Statistical NLP - corpus-based resources
The ELRA home page
KDnuggets Data Mining, Web Mining, and Knowledge Discovery Guide
Information Filtering Resources
MLnet OiS - Machine Learning, Knowledge Discovery, Data Mining, Case-based Reasoning, and Knowledge Acquisition
Glasgow IDOM - IR resources
Weblog of computational linguistics
WebIR
ACL SIG on Natural Language Learning (SIGNLL)
COLE sites about Computational Linguistics
EACL
HLT Home
Back to top
LaTeX
Tutorials
Advanced LaTeX
LaTeX- from quick and dirty to style and finesse
Reference
LaTeX2e Help file
Help on LaTeX commands
The LaTeX Encycolpedia
Math Symbols in LaTeX
LATEX maths and graphics
The Technion Guide to LATEX2e
Usage
CTAN LaTeX Archive
The TeX Catalog Online
TeX Users Group Home Page
Back to top
Evgeniy Gabrilovich
gabr@cs.technion.ac.il
Last updated on November 30, 2011
Keywords: Computational Linguistics,
Natural Language Processing, NLP,
Natural Language Understanding, Natural Language Analysis,
Natural Language Generation, Information Retrieval, IR,
Text Categorization, Artificial Intelligence, AI,
Machine Learning, Corpus Linguistics, Algorithm Design,
Text Mining, Text Data Mining, Digital Signal Processing, DSP,
Speech Processing, Speech Recognition, SR,
Automatic Speaker Recognition, ASR, Speaker Identification,
Speaker Verification