JUHE API Marketplace
DATASET
Open Source Community

Re-DocRED-CF

Re‑DocRED‑CF is a counterfactual dataset for document‑level relation extraction, generated by entity replacement. It contains five counterfactual variants, each with training, development, and test splits, plus a mixed training set. Each example includes document title, relation labels, entity vertex sets, tokenized sentences, and the original document ID indicating its index in the seed dataset.

Updated 10/15/2024
huggingface

Description

Re‑DocRED‑CF Dataset Overview

Dataset Description

Re‑DocRED‑CF is a counterfactual dataset for document‑level relation extraction (RE), created by replacing entities to evaluate and mitigate factual bias in document‑level RE.

Dataset Structure

The dataset comprises five counterfactual variants, each containing the following files:

  • train.jsonl
  • dev.jsonl
  • test.jsonl
  • train_mix.jsonl

Variant List

  • var-01
  • var-02
  • var-03
  • var-04
  • var-05
  • var-06
  • var-07
  • var-08
  • var-09

Data Format

Each data file includes the following fields:

  • title: document title.
  • labels: list of relations; each entry links a head entity to a tail entity and may include supporting evidence sentences.
  • vertexSet: list of entity vertices, each representing all mentions of an entity and its type within the document.
  • sents: tokenized sentences.
  • original_doc_id: index of the example in the original seed dataset.

Loading the Dataset

from datasets import load_dataset
dataset = load_dataset("amodaresi/Re-DocRED-CF", "var-01")

Citation

If you use this dataset, please cite the following paper:

@inproceedings{modarressi-covered-2024,
  title={Consistent Document‑Level Relation Extraction via Counterfactuals},
  author={Ali Modarressi and Abdullatif K{"o}ksal and Hinrich Sch{"u}tze},
  year={2024},
  booktitle={Findings of the Association for Computational Linguistics: EMNLP 2024},
  address={Miami, United States},
  publisher={Association for Computational Linguistics}
}

AI studio

Generate PPTs instantly with Nano Banana Pro.

Generate PPT Now

Access Dataset

Login to Access

Please login to view download links and access full dataset details.

Topics

Relation Extraction
Counterfactual Reasoning

Source

Organization: huggingface

Created: 10/14/2024

Power Your Data Analysis with Premium AI Models

Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.

Enjoy a free trial and save 20%+ compared to official pricing.