Back to datasets
Dataset assetOpen Source CommunityNatural Language ProcessingFact Verification
wenhu/tab_fact
TabFact is a large‑scale dataset comprising 16 k Wikipedia tables as evidence and 118 k manually annotated statements for fact verification based on semi‑structured evidence. Statements are labeled as ENTAILED or REFUTED. The dataset is challenging because it requires both soft linguistic reasoning and hard symbolic reasoning.
Source
hugging_face
Created
Nov 28, 2025
Updated
Jan 18, 2024
Signals
198 views
Availability
Linked source ready
Overview
Dataset description and usage context
Dataset Overview
- Name: TabFact
- Language: English (en)
- License: CC‑BY‑4.0
- Multilinguality: Monolingual
- Size: 100 K < size < 1 M
- Source: Original data
- Task Category: Text Classification
- Task ID: Fact‑checking
- Paper/Code ID: tabfact
- Pretty Name: TabFact
Structure
Config: tab_fact
- Features:
- id: int32
- table_id: string
- table_text: string
- table_caption: string
- statement: string
- label:
- class_label:
- names:
- 0: refuted
- 1: entailed
- names:
- class_label:
- Splits:
- train: num_bytes 99,852,664; num_examples 92,283
- validation: num_bytes 13,846,872; num_examples 12,792
- test: num_bytes 13,493,391; num_examples 12,779
- download_size: 196,508,436
- dataset_size: 127,192,927
Config: blind_test
- Features:
- id: int32
- table_id: string
- table_text: string
- table_caption: string
- statement: string
- test_id: string
- Splits:
- test: num_bytes 10,954,442; num_examples 9,750
- download_size: 196,508,436
- dataset_size: 10,954,442
Creation
- Annotation Workers: Crowdsourcing
- Language Workers: Crowdsourcing
Need downstream help?
Pair the dataset with AI analysis and content workflows.
Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.