Back to datasets
Dataset assetOpen Source CommunityNatural Language ProcessingFact Verification

wenhu/tab_fact

TabFact is a large‑scale dataset comprising 16 k Wikipedia tables as evidence and 118 k manually annotated statements for fact verification based on semi‑structured evidence. Statements are labeled as ENTAILED or REFUTED. The dataset is challenging because it requires both soft linguistic reasoning and hard symbolic reasoning.

Source
hugging_face
Created
Nov 28, 2025
Updated
Jan 18, 2024
Signals
198 views
Availability
Linked source ready
Overview

Dataset description and usage context

Dataset Overview

  • Name: TabFact
  • Language: English (en)
  • License: CC‑BY‑4.0
  • Multilinguality: Monolingual
  • Size: 100 K < size < 1 M
  • Source: Original data
  • Task Category: Text Classification
  • Task ID: Fact‑checking
  • Paper/Code ID: tabfact
  • Pretty Name: TabFact

Structure

Config: tab_fact

  • Features:
    • id: int32
    • table_id: string
    • table_text: string
    • table_caption: string
    • statement: string
    • label:
      • class_label:
        • names:
          • 0: refuted
          • 1: entailed
  • Splits:
    • train: num_bytes 99,852,664; num_examples 92,283
    • validation: num_bytes 13,846,872; num_examples 12,792
    • test: num_bytes 13,493,391; num_examples 12,779
    • download_size: 196,508,436
    • dataset_size: 127,192,927

Config: blind_test

  • Features:
    • id: int32
    • table_id: string
    • table_text: string
    • table_caption: string
    • statement: string
    • test_id: string
  • Splits:
    • test: num_bytes 10,954,442; num_examples 9,750
    • download_size: 196,508,436
    • dataset_size: 10,954,442

Creation

  • Annotation Workers: Crowdsourcing
  • Language Workers: Crowdsourcing
Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio