Resources for Text, Speech and Language Processing

Data collections

Collections maintained at this site

  1. The WordSimilarity-353 Test Collection
  2. TechTC - Technion Repository of Text Categorization Datasets

Collections maintained elsewhere

  1. Tagged datasets for named entity recognition tasks
  2. Test collections at AOL Research
  3. Linguistic Data Consortium (LDC)
  4. UCI Machine Learning Repository
Back to top

Evgeniy Gabrilovich
gabr@cs.technion.ac.il

Last updated on August 7, 2006


Keywords: Computational Linguistics, Natural Language Processing, NLP, Natural Language Understanding, Natural Language Analysis, Natural Language Generation, Information Retrieval, IR, Text Categorization, Artificial Intelligence, AI, Machine Learning, Corpus Linguistics, Algorithm Design, Text Mining, Text Data Mining, Digital Signal Processing, DSP, Speech Processing, Speech Recognition, SR, Automatic Speaker Recognition, ASR, Speaker Identification, Speaker Verification