Back to datasets
Dataset assetOpen Source CommunityNatural Language ProcessingCommon-sense Reasoning

tau/commonsense_qa

CommonsenseQA is a new multiple‑choice QA dataset that requires various types of commonsense knowledge to predict the correct answer. It contains 12,102 questions, each with one correct answer and four distractors. The dataset is split into training, validation, and test sets, primarily in English.

Source
hugging_face
Created
Nov 28, 2025
Updated
Jan 4, 2024
Signals
425 views
Availability
Linked source ready
Overview

Dataset description and usage context

Dataset Overview

Basic Information

  • Name: CommonsenseQA
  • Language: English (en)
  • License: MIT
  • Multilinguality: Monolingual
  • Size: 1K<n<10K
  • Source Data: Original data
  • Task Category: Question Answering
  • Task ID: open-domain-qa
  • Paper Code ID: commonsenseqa
  • Display Name: CommonsenseQA

Dataset Structure

  • Features:
    • id: String type, unique ID.
    • question: String type, question description.
    • question_concept: String type, concept related to the question.
    • choices: Dictionary type, containing option labels and texts.
      • label: String type, option label.
      • text: String type, option text.
    • answerKey: String type, correct answer.
  • Data Splits:
    • train: 9,741 samples, 2,207,794 bytes.
    • validation: 1,221 samples, 273,848 bytes.
    • test: 1,140 samples, 257,842 bytes.
    • Total Download Size: 1,558,570 bytes.
    • Total Dataset Size: 2,739,484 bytes.

Dataset Creation

  • Annotation Creators: Crowd‑sourced
  • Language Creators: Crowd‑sourced

Usage Considerations

  • License: MIT, see details at this link.

  • Citation Information:

    @inproceedings{talmor-etal-2019-commonsenseqa, title = "{C}ommonsense{QA}: A Question Answering Challenge Targeting Commonsense Knowledge", author = "Talmor, Alon and Herzig, Jonathan and Lourie, Nicholas and Berant, Jonathan", booktitle = "Proceedings of the 2019 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)", month = jun, year = "2019", address = "Minneapolis, Minnesota", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/N19-1421", doi = "10.18653/v1/N19-1421", pages = "4149--4158", archivePrefix = "arXiv", eprint = "1811.00937", primaryClass = "cs", }

Contributors

Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio