commonsense_qa
CommonsenseQA is a new multiple‑choice QA dataset that requires using various types of commonsense knowledge to predict the correct answer. The dataset provides two main train/validation/test splits: 'random split' and 'question‑label split' (see the paper for details). It contains a training set (9,741 samples), a validation set (1,221 samples), and a test set (1,140 samples). Each sample includes a unique ID, question text, question concept, options (label and text), and an answer key. The dataset is in English and is released under the MIT license.
Description
Dataset Overview
Dataset Description
- Name: CommonsenseQA
- Language: English (
en) - License: MIT
- Multilinguality: Monolingual
- Size Category: 1K<n<10K
- Source Dataset: Original data
- Task Category: Question Answering
- Task ID: Open‑domain QA
- PapersWithCode ID: commonsenseqa
- Alias: CommonsenseQA
Dataset Structure
Features
id(string): Unique IDquestion(string): Questionquestion_concept(string): ConceptNet concept related to the questionchoices(dictionary):label(string): Option labeltext(string): Option text
answerKey(string): Answer
Splits
train- Bytes: 2,207,794
- Samples: 9,741
validation- Bytes: 273,848
- Samples: 1,221
test- Bytes: 257,842
- Samples: 1,140
Configurations
default- Data files:
train:data/train-*validation:data/validation-*test:data/test-*
- Data files:
Dataset Creation
License Information
The dataset is released under the MIT license.
Citation Information
@inproceedings{talmor-etal-2019-commonsenseqa, title = "{C}ommonsense{QA}: A Question Answering Challenge Targeting Commonsense Knowledge", author = "Talmor, Alon and Herzig, Jonathan and Lourie, Nicholas and Berant, Jonathan", booktitle = "Proceedings of the 2019 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)", month = jun, year = "2019", address = "Minneapolis, Minnesota", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/N19-1421", doi = "10.18653/v1/N19-1421", pages = "4149--4158", archivePrefix = "arXiv", eprint = "1811.00937", primaryClass = "cs", }
AI studio
Generate PPTs instantly with Nano Banana Pro.
Generate PPT NowAccess Dataset
Please login to view download links and access full dataset details.
Topics
Source
Organization: huggingface
Created: 7/22/2024
Power Your Data Analysis with Premium AI Models
Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.
Enjoy a free trial and save 20%+ compared to official pricing.