JUHE API Marketplace
DATASET
Classic Dataset

Reuters-21578 Text Categorization Collection

Reuters‑21578 text classification collection, used for text classification research, released in 1999.

Updated 5/24/2019
github

Description

NLP_Dataset Overview

1. Text Classification

  • Reuters‑21578 Text Categorization Collection (1999)
  • Large Movie Review Dataset v1.0 (2011)
  • Datasets for single‑label text categorization (2007)

2. Question Answering Systems

  • Stanford Question Answering Dataset (SQuAD)
  • Deepmind Question Answering Corpus
  • Amazon question/answer data

3. Speech Recognition

  • TIMIT Acoustic‑Phonetic Continuous Speech Corpus
  • voxforge
  • LibriSpeech ASR corpus

4. Machine Translation

  • Aligned Hansards of the 36th Parliament of Canada Release 2001‑1a
  • European Parliament Proceedings Parallel Corpus 1996‑2011

5. Document Summarization

  • The AQUAINT Corpus of English News Text
  • Legal Case Reports Data Set

6. More Datasets

Biomedical Domain

  • Mutation extraction
    • MutationFinder (MF)
    • extractor of mutation (EMU)
    • tmVar

All data sources: http://infos.korea.ac.kr/bronco/PublicCorpus.zip

AI studio

Generate PPTs instantly with Nano Banana Pro.

Generate PPT Now

Access Dataset

Login to Access

Please login to view download links and access full dataset details.

Topics

Text Classification

Source

Organization: github

Created: 5/18/2019

Power Your Data Analysis with Premium AI Models

Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.

Enjoy a free trial and save 20%+ compared to official pricing.