Explore high-quality datasets for your AI and machine learning projects.
The Text Retrieval Conference (TREC) question classification dataset contains 5,500 training questions and 500 test questions. It provides six coarse‑grained categories and 50 fine‑grained categories. The questions originate from four sources: 4,500 English questions released by USC, ~500 manually constructed questions, 894 questions from TREC‑8 and TREC‑9, and 500 test questions from TREC‑10. All questions are manually labeled. The task is text classification, specifically multi‑class classification.