JUHE API Marketplace
DATASET
Open Source Community

tau/commonsense_qa

CommonsenseQA is a new multiple‑choice QA dataset that requires various types of commonsense knowledge to predict the correct answer. It contains 12,102 questions, each with one correct answer and four distractors. The dataset is split into training, validation, and test sets, primarily in English.

Updated 1/4/2024
hugging_face

Description

Dataset Overview

Basic Information

  • Name: CommonsenseQA
  • Language: English (en)
  • License: MIT
  • Multilinguality: Monolingual
  • Size: 1K<n<10K
  • Source Data: Original data
  • Task Category: Question Answering
  • Task ID: open-domain-qa
  • Paper Code ID: commonsenseqa
  • Display Name: CommonsenseQA

Dataset Structure

  • Features:
    • id: String type, unique ID.
    • question: String type, question description.
    • question_concept: String type, concept related to the question.
    • choices: Dictionary type, containing option labels and texts.
      • label: String type, option label.
      • text: String type, option text.
    • answerKey: String type, correct answer.
  • Data Splits:
    • train: 9,741 samples, 2,207,794 bytes.
    • validation: 1,221 samples, 273,848 bytes.
    • test: 1,140 samples, 257,842 bytes.
    • Total Download Size: 1,558,570 bytes.
    • Total Dataset Size: 2,739,484 bytes.

Dataset Creation

  • Annotation Creators: Crowd‑sourced
  • Language Creators: Crowd‑sourced

Usage Considerations

  • License: MIT, see details at this link.

  • Citation Information:

    @inproceedings{talmor-etal-2019-commonsenseqa, title = "{C}ommonsense{QA}: A Question Answering Challenge Targeting Commonsense Knowledge", author = "Talmor, Alon and Herzig, Jonathan and Lourie, Nicholas and Berant, Jonathan", booktitle = "Proceedings of the 2019 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)", month = jun, year = "2019", address = "Minneapolis, Minnesota", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/N19-1421", doi = "10.18653/v1/N19-1421", pages = "4149--4158", archivePrefix = "arXiv", eprint = "1811.00937", primaryClass = "cs", }

Contributors

AI studio

Generate PPTs instantly with Nano Banana Pro.

Generate PPT Now

Access Dataset

Login to Access

Please login to view download links and access full dataset details.

Topics

Common-sense Reasoning
Natural Language Processing

Source

Organization: hugging_face

Created: Unknown

Power Your Data Analysis with Premium AI Models

Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.

Enjoy a free trial and save 20%+ compared to official pricing.