Dataset assetOpen Source CommunityText ClassificationMulti-class Classification

CogComp/trec

The Text Retrieval Conference (TREC) question classification dataset contains 5,500 training questions and 500 test questions. It provides six coarse‑grained categories and 50 fine‑grained categories. The questions originate from four sources: 4,500 English questions released by USC, ~500 manually constructed questions, 894 questions from TREC‑8 and TREC‑9, and 500 test questions from TREC‑10. All questions are manually labeled. The task is text classification, specifically multi‑class classification.

Source

hugging_face

Created

Nov 28, 2025

Updated

Jan 18, 2024

Signals

500 views

Availability

Linked source ready

Overview

Dataset description and usage context

Dataset Overview

Basic Information

Dataset Name: Text Retrieval Conference Question Answering (TRECQA)
Language: English (en)
License: Unknown
Multilinguality: Monolingual
Size: 1K < n < 10K
Source: Raw data
Task Type: Text Classification
Task ID: Multi‑class Classification
Paper ID: trecqa
Pretty Name: Text Retrieval Conference Question Answering

Structure

Features

text (string): question text.
coarse_label (categorical): coarse categories, possible values:
- ABBR (0): abbreviation.
- ENTY (1): entity.
- DESC (2): description/abstract concept.
- HUM (3): human.
- LOC (4): location.
- NUM (5): numeric.
fine_label (categorical): fine categories, grouped under ABBREVIATION, ENTITY, DESCRIPTION, HUMAN, LOCATION, NUMERIC (see original for full list).

Splits

train: 5,452 samples
test: 500 samples

Dataset Creation

Summary

Training set: 5,452 labeled questions
Test set: 500 labeled questions
Coarse categories: 6
Fine categories: 50
Average sentence length: 10 tokens
Vocabulary size: 8,700

Sources

4,500 questions from USC (Hovy et al., 2001)
~500 manually created questions for rare classes
894 questions from TREC‑8 and TREC‑9
500 TREC‑10 questions as test set

Citation

@inproceedings{li-roth-2002-learning,
    title = "Learning Question Classifiers",
    author = "Li, Xin and Roth, Dan",
    booktitle = "COLING 2002",
    year = "2002",
    url = "https://www.aclweb.org/anthology/C02-1150",
}
@inproceedings{hovy-etal-2001-toward,
    title = "Toward Semantics‑Based Answer Pinpointing",
    author = "Hovy, Eduard and Gerber, Laurie and ...",
    booktitle = "First International Conference on Human Language Technology Research",
    year = "2001",
    url = "https://www.aclweb.org/anthology/H01-1069",
}

Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio