Dataset assetOpen Source CommunityNatural Language ProcessingCommon-sense Reasoning

tau/commonsense_qa

CommonsenseQA is a new multiple‑choice QA dataset that requires various types of commonsense knowledge to predict the correct answer. It contains 12,102 questions, each with one correct answer and four distractors. The dataset is split into training, validation, and test sets, primarily in English.

Source

hugging_face

Created

Nov 28, 2025

Updated

Jan 4, 2024

Signals

425 views

Availability

Linked source ready

Overview

Dataset description and usage context

Dataset Overview

Basic Information

Name: CommonsenseQA
Language: English (en)
License: MIT
Multilinguality: Monolingual
Size: 1K<n<10K
Source Data: Original data
Task Category: Question Answering
Task ID: open-domain-qa
Paper Code ID: commonsenseqa
Display Name: CommonsenseQA

Dataset Structure

Features:
- id: String type, unique ID.
- question: String type, question description.
- question_concept: String type, concept related to the question.
- choices: Dictionary type, containing option labels and texts.
  - label: String type, option label.
  - text: String type, option text.
- answerKey: String type, correct answer.
Data Splits:
- train: 9,741 samples, 2,207,794 bytes.
- validation: 1,221 samples, 273,848 bytes.
- test: 1,140 samples, 257,842 bytes.
- Total Download Size: 1,558,570 bytes.
- Total Dataset Size: 2,739,484 bytes.

Dataset Creation

Annotation Creators: Crowd‑sourced
Language Creators: Crowd‑sourced

Usage Considerations

License: MIT, see details at this link.
Citation Information:

@inproceedings{talmor-etal-2019-commonsenseqa, title = "{C}ommonsense{QA}: A Question Answering Challenge Targeting Commonsense Knowledge", author = "Talmor, Alon and Herzig, Jonathan and Lourie, Nicholas and Berant, Jonathan", booktitle = "Proceedings of the 2019 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)", month = jun, year = "2019", address = "Minneapolis, Minnesota", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/N19-1421", doi = "10.18653/v1/N19-1421", pages = "4149--4158", archivePrefix = "arXiv", eprint = "1811.00937", primaryClass = "cs", }

Contributors

Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio