CogComp/trec
The Text Retrieval Conference (TREC) question classification dataset contains 5,500 training questions and 500 test questions. It provides six coarse‑grained categories and 50 fine‑grained categories. The questions originate from four sources: 4,500 English questions released by USC, ~500 manually constructed questions, 894 questions from TREC‑8 and TREC‑9, and 500 test questions from TREC‑10. All questions are manually labeled. The task is text classification, specifically multi‑class classification.
Description
Dataset Overview
Basic Information
- Dataset Name: Text Retrieval Conference Question Answering (TRECQA)
- Language: English (en)
- License: Unknown
- Multilinguality: Monolingual
- Size: 1K < n < 10K
- Source: Raw data
- Task Type: Text Classification
- Task ID: Multi‑class Classification
- Paper ID: trecqa
- Pretty Name: Text Retrieval Conference Question Answering
Structure
Features
- text (string): question text.
- coarse_label (categorical): coarse categories, possible values:
- ABBR (0): abbreviation.
- ENTY (1): entity.
- DESC (2): description/abstract concept.
- HUM (3): human.
- LOC (4): location.
- NUM (5): numeric.
- fine_label (categorical): fine categories, grouped under ABBREVIATION, ENTITY, DESCRIPTION, HUMAN, LOCATION, NUMERIC (see original for full list).
Splits
- train: 5,452 samples
- test: 500 samples
Dataset Creation
Summary
- Training set: 5,452 labeled questions
- Test set: 500 labeled questions
- Coarse categories: 6
- Fine categories: 50
- Average sentence length: 10 tokens
- Vocabulary size: 8,700
Sources
- 4,500 questions from USC (Hovy et al., 2001)
- ~500 manually created questions for rare classes
- 894 questions from TREC‑8 and TREC‑9
- 500 TREC‑10 questions as test set
Citation
@inproceedings{li-roth-2002-learning,
title = "Learning Question Classifiers",
author = "Li, Xin and Roth, Dan",
booktitle = "COLING 2002",
year = "2002",
url = "https://www.aclweb.org/anthology/C02-1150",
}
@inproceedings{hovy-etal-2001-toward,
title = "Toward Semantics‑Based Answer Pinpointing",
author = "Hovy, Eduard and Gerber, Laurie and ...",
booktitle = "First International Conference on Human Language Technology Research",
year = "2001",
url = "https://www.aclweb.org/anthology/H01-1069",
}
AI studio
Generate PPTs instantly with Nano Banana Pro.
Generate PPT NowAccess Dataset
Please login to view download links and access full dataset details.
Topics
Source
Organization: hugging_face
Created: Unknown
Power Your Data Analysis with Premium AI Models
Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.
Enjoy a free trial and save 20%+ compared to official pricing.