JUHE API Marketplace
DATASET
Open Source Community

commonsense_qa

CommonsenseQA is a new multiple‑choice QA dataset that requires using various types of commonsense knowledge to predict the correct answer. The dataset provides two main train/validation/test splits: 'random split' and 'question‑label split' (see the paper for details). It contains a training set (9,741 samples), a validation set (1,221 samples), and a test set (1,140 samples). Each sample includes a unique ID, question text, question concept, options (label and text), and an answer key. The dataset is in English and is released under the MIT license.

Updated 8/5/2024
huggingface

Description

Dataset Overview

Dataset Description

  • Name: CommonsenseQA
  • Language: English (en)
  • License: MIT
  • Multilinguality: Monolingual
  • Size Category: 1K<n<10K
  • Source Dataset: Original data
  • Task Category: Question Answering
  • Task ID: Open‑domain QA
  • PapersWithCode ID: commonsenseqa
  • Alias: CommonsenseQA

Dataset Structure

Features

  • id (string): Unique ID
  • question (string): Question
  • question_concept (string): ConceptNet concept related to the question
  • choices (dictionary):
    • label (string): Option label
    • text (string): Option text
  • answerKey (string): Answer

Splits

  • train
    • Bytes: 2,207,794
    • Samples: 9,741
  • validation
    • Bytes: 273,848
    • Samples: 1,221
  • test
    • Bytes: 257,842
    • Samples: 1,140

Configurations

  • default
    • Data files:
      • train: data/train-*
      • validation: data/validation-*
      • test: data/test-*

Dataset Creation

License Information

The dataset is released under the MIT license.

Citation Information

@inproceedings{talmor-etal-2019-commonsenseqa, title = "{C}ommonsense{QA}: A Question Answering Challenge Targeting Commonsense Knowledge", author = "Talmor, Alon and Herzig, Jonathan and Lourie, Nicholas and Berant, Jonathan", booktitle = "Proceedings of the 2019 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)", month = jun, year = "2019", address = "Minneapolis, Minnesota", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/N19-1421", doi = "10.18653/v1/N19-1421", pages = "4149--4158", archivePrefix = "arXiv", eprint = "1811.00937", primaryClass = "cs", }

AI studio

Generate PPTs instantly with Nano Banana Pro.

Generate PPT Now

Access Dataset

Login to Access

Please login to view download links and access full dataset details.

Topics

Question Answering Systems
Common-sense Reasoning

Source

Organization: huggingface

Created: 7/22/2024

Power Your Data Analysis with Premium AI Models

Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.

Enjoy a free trial and save 20%+ compared to official pricing.