Explore high-quality datasets for your AI and machine learning projects.
WinoGrande is a dataset containing 44,000 problems, inspired by the Winograd Schema Challenge, but scaled up and adjusted to improve robustness against dataset‑specific biases. The task is cloze, providing two options; the goal is to select the correct option for the given sentence, requiring common‑sense reasoning.
CommonsenseQA is a new multiple‑choice QA dataset that requires using various types of commonsense knowledge to predict the correct answer. The dataset provides two main train/validation/test splits: 'random split' and 'question‑label split' (see the paper for details). It contains a training set (9,741 samples), a validation set (1,221 samples), and a test set (1,140 samples). Each sample includes a unique ID, question text, question concept, options (label and text), and an answer key. The dataset is in English and is released under the MIT license.
CommonsenseQA is a new multiple‑choice QA dataset that requires various types of commonsense knowledge to predict the correct answer. It contains 12,102 questions, each with one correct answer and four distractors. The dataset is split into training, validation, and test sets, primarily in English.