tau/commonsense_qa
CommonsenseQA is a new multiple‑choice QA dataset that requires various types of commonsense knowledge to predict the correct answer. It contains 12,102 questions, each with one correct answer and four distractors. The dataset is split into training, validation, and test sets, primarily in English.
Dataset description and usage context
Dataset Overview
Basic Information
- Name: CommonsenseQA
- Language: English (
en) - License: MIT
- Multilinguality: Monolingual
- Size: 1K<n<10K
- Source Data: Original data
- Task Category: Question Answering
- Task ID: open-domain-qa
- Paper Code ID: commonsenseqa
- Display Name: CommonsenseQA
Dataset Structure
- Features:
id: String type, unique ID.question: String type, question description.question_concept: String type, concept related to the question.choices: Dictionary type, containing option labels and texts.label: String type, option label.text: String type, option text.
answerKey: String type, correct answer.
- Data Splits:
train: 9,741 samples, 2,207,794 bytes.validation: 1,221 samples, 273,848 bytes.test: 1,140 samples, 257,842 bytes.- Total Download Size: 1,558,570 bytes.
- Total Dataset Size: 2,739,484 bytes.
Dataset Creation
- Annotation Creators: Crowd‑sourced
- Language Creators: Crowd‑sourced
Usage Considerations
-
License: MIT, see details at this link.
-
Citation Information:
@inproceedings{talmor-etal-2019-commonsenseqa, title = "{C}ommonsense{QA}: A Question Answering Challenge Targeting Commonsense Knowledge", author = "Talmor, Alon and Herzig, Jonathan and Lourie, Nicholas and Berant, Jonathan", booktitle = "Proceedings of the 2019 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)", month = jun, year = "2019", address = "Minneapolis, Minnesota", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/N19-1421", doi = "10.18653/v1/N19-1421", pages = "4149--4158", archivePrefix = "arXiv", eprint = "1811.00937", primaryClass = "cs", }
Contributors
Pair the dataset with AI analysis and content workflows.
Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.