High Quality Data

Dataset Hub

Explore high-quality datasets for your AI and machine learning projects.

Sort:

Browse by Category

wenhu/tab_fact

TabFact is a large‑scale dataset comprising 16 k Wikipedia tables as evidence and 118 k manually annotated statements for fact verification based on semi‑structured evidence. Statements are labeled as ENTAILED or REFUTED. The dataset is challenging because it requires both soft linguistic reasoning and hard symbolic reasoning.

hugging_face

View Details

atmallen/popqa-parents-lying-non-err

Public Opinion Analysis

Fact Verification

The dataset includes three features: text, label, and true_label. The label field is binary (false/true) and the true_label is an integer. The data are split into training (23,952 samples), validation (5,136 samples), and test (5,160 samples). Total dataset size is 3,464,362.5 bytes; download size is 544,025 bytes.

hugging_face

View Details