wenhu/tab_fact

TabFact is a large‑scale dataset comprising 16 k Wikipedia tables as evidence and 118 k manually annotated statements for fact verification based on semi‑structured evidence. Statements are labeled as ENTAILED or REFUTED. The dataset is challenging because it requires both soft linguistic reasoning and hard symbolic reasoning.

Updated 1/18/2024

hugging_face

Dataset Overview

Name: TabFact
Language: English (en)
License: CC‑BY‑4.0
Multilinguality: Monolingual
Size: 100 K < size < 1 M
Source: Original data
Task Category: Text Classification
Task ID: Fact‑checking
Paper/Code ID: tabfact
Pretty Name: TabFact

Structure

Config: tab_fact

Features:
- id: int32
- table_id: string
- table_text: string
- table_caption: string
- statement: string
- label:
  - class_label:
    - names:
      - 0: refuted
      - 1: entailed
Splits:
- train: num_bytes 99,852,664; num_examples 92,283
- validation: num_bytes 13,846,872; num_examples 12,792
- test: num_bytes 13,493,391; num_examples 12,779
- download_size: 196,508,436
- dataset_size: 127,192,927

Config: blind_test

Features:
- id: int32
- table_id: string
- table_text: string
- table_caption: string
- statement: string
- test_id: string
Splits:
- test: num_bytes 10,954,442; num_examples 9,750
- download_size: 196,508,436
- dataset_size: 10,954,442

Creation

Annotation Workers: Crowdsourcing
Language Workers: Crowdsourcing

wenhu/tab_fact

Description

Dataset Overview

Structure

Config: tab_fact

Config: blind_test

Creation

AI studio

Access Dataset

Topics

Source