DATASET
Open Source Community
wenhu/tab_fact
TabFact is a large‑scale dataset comprising 16 k Wikipedia tables as evidence and 118 k manually annotated statements for fact verification based on semi‑structured evidence. Statements are labeled as ENTAILED or REFUTED. The dataset is challenging because it requires both soft linguistic reasoning and hard symbolic reasoning.
Updated 1/18/2024
hugging_face
Description
Dataset Overview
- Name: TabFact
- Language: English (en)
- License: CC‑BY‑4.0
- Multilinguality: Monolingual
- Size: 100 K < size < 1 M
- Source: Original data
- Task Category: Text Classification
- Task ID: Fact‑checking
- Paper/Code ID: tabfact
- Pretty Name: TabFact
Structure
Config: tab_fact
- Features:
- id: int32
- table_id: string
- table_text: string
- table_caption: string
- statement: string
- label:
- class_label:
- names:
- 0: refuted
- 1: entailed
- names:
- class_label:
- Splits:
- train: num_bytes 99,852,664; num_examples 92,283
- validation: num_bytes 13,846,872; num_examples 12,792
- test: num_bytes 13,493,391; num_examples 12,779
- download_size: 196,508,436
- dataset_size: 127,192,927
Config: blind_test
- Features:
- id: int32
- table_id: string
- table_text: string
- table_caption: string
- statement: string
- test_id: string
- Splits:
- test: num_bytes 10,954,442; num_examples 9,750
- download_size: 196,508,436
- dataset_size: 10,954,442
Creation
- Annotation Workers: Crowdsourcing
- Language Workers: Crowdsourcing
AI studio
Generate PPTs instantly with Nano Banana Pro.
Generate PPT NowAccess Dataset
Login to Access
Please login to view download links and access full dataset details.
Topics
Fact Verification
Natural Language Processing
Source
Organization: hugging_face
Created: Unknown
Power Your Data Analysis with Premium AI Models
Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.
Enjoy a free trial and save 20%+ compared to official pricing.