Explore high-quality datasets for your AI and machine learning projects.
TabFact is a large‑scale dataset comprising 16 k Wikipedia tables as evidence and 118 k manually annotated statements for fact verification based on semi‑structured evidence. Statements are labeled as ENTAILED or REFUTED. The dataset is challenging because it requires both soft linguistic reasoning and hard symbolic reasoning.
The dataset includes three features: text, label, and true_label. The label field is binary (false/true) and the true_label is an integer. The data are split into training (23,952 samples), validation (5,136 samples), and test (5,160 samples). Total dataset size is 3,464,362.5 bytes; download size is 544,025 bytes.