commonsense_qa
CommonsenseQA is a new multiple‑choice QA dataset that requires using various types of commonsense knowledge to predict the correct answer. The dataset provides two main train/validation/test splits: 'random split' and 'question‑label split' (see the paper for details). It contains a training set (9,741 samples), a validation set (1,221 samples), and a test set (1,140 samples). Each sample includes a unique ID, question text, question concept, options (label and text), and an answer key. The dataset is in English and is released under the MIT license.
Dataset description and usage context
Dataset Overview
Dataset Description
- Name: CommonsenseQA
- Language: English (
en) - License: MIT
- Multilinguality: Monolingual
- Size Category: 1K<n<10K
- Source Dataset: Original data
- Task Category: Question Answering
- Task ID: Open‑domain QA
- PapersWithCode ID: commonsenseqa
- Alias: CommonsenseQA
Dataset Structure
Features
id(string): Unique IDquestion(string): Questionquestion_concept(string): ConceptNet concept related to the questionchoices(dictionary):label(string): Option labeltext(string): Option text
answerKey(string): Answer
Splits
train- Bytes: 2,207,794
- Samples: 9,741
validation- Bytes: 273,848
- Samples: 1,221
test- Bytes: 257,842
- Samples: 1,140
Configurations
default- Data files:
train:data/train-*validation:data/validation-*test:data/test-*
- Data files:
Dataset Creation
License Information
The dataset is released under the MIT license.
Citation Information
@inproceedings{talmor-etal-2019-commonsenseqa, title = "{C}ommonsense{QA}: A Question Answering Challenge Targeting Commonsense Knowledge", author = "Talmor, Alon and Herzig, Jonathan and Lourie, Nicholas and Berant, Jonathan", booktitle = "Proceedings of the 2019 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)", month = jun, year = "2019", address = "Minneapolis, Minnesota", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/N19-1421", doi = "10.18653/v1/N19-1421", pages = "4149--4158", archivePrefix = "arXiv", eprint = "1811.00937", primaryClass = "cs", }
Pair the dataset with AI analysis and content workflows.
Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.