High Quality Data

Dataset Hub

Explore high-quality datasets for your AI and machine learning projects.

Sort:

Browse by Category

allenai/winogrande

WinoGrande is a dataset containing 44,000 problems, inspired by the Winograd Schema Challenge, but scaled up and adjusted to improve robustness against dataset‑specific biases. The task is cloze, providing two options; the goal is to select the correct option for the given sentence, requiring common‑sense reasoning.

hugging_face

View Details

commonsense_qa

Question Answering Systems

Common-sense Reasoning

CommonsenseQA is a new multiple‑choice QA dataset that requires using various types of commonsense knowledge to predict the correct answer. The dataset provides two main train/validation/test splits: 'random split' and 'question‑label split' (see the paper for details). It contains a training set (9,741 samples), a validation set (1,221 samples), and a test set (1,140 samples). Each sample includes a unique ID, question text, question concept, options (label and text), and an answer key. The dataset is in English and is released under the MIT license.

huggingface

View Details

tau/commonsense_qa

Common-sense Reasoning

Natural Language Processing

CommonsenseQA is a new multiple‑choice QA dataset that requires various types of commonsense knowledge to predict the correct answer. It contains 12,102 questions, each with one correct answer and four distractors. The dataset is split into training, validation, and test sets, primarily in English.

hugging_face

View Details