Re-DocRED-CF
Re‑DocRED‑CF is a counterfactual dataset for document‑level relation extraction, generated by entity replacement. It contains five counterfactual variants, each with training, development, and test splits, plus a mixed training set. Each example includes document title, relation labels, entity vertex sets, tokenized sentences, and the original document ID indicating its index in the seed dataset.
Description
Re‑DocRED‑CF Dataset Overview
Dataset Description
Re‑DocRED‑CF is a counterfactual dataset for document‑level relation extraction (RE), created by replacing entities to evaluate and mitigate factual bias in document‑level RE.
Dataset Structure
The dataset comprises five counterfactual variants, each containing the following files:
train.jsonldev.jsonltest.jsonltrain_mix.jsonl
Variant List
var-01var-02var-03var-04var-05var-06var-07var-08var-09
Data Format
Each data file includes the following fields:
title: document title.labels: list of relations; each entry links a head entity to a tail entity and may include supporting evidence sentences.vertexSet: list of entity vertices, each representing all mentions of an entity and its type within the document.sents: tokenized sentences.original_doc_id: index of the example in the original seed dataset.
Loading the Dataset
from datasets import load_dataset
dataset = load_dataset("amodaresi/Re-DocRED-CF", "var-01")
Citation
If you use this dataset, please cite the following paper:
@inproceedings{modarressi-covered-2024,
title={Consistent Document‑Level Relation Extraction via Counterfactuals},
author={Ali Modarressi and Abdullatif K{"o}ksal and Hinrich Sch{"u}tze},
year={2024},
booktitle={Findings of the Association for Computational Linguistics: EMNLP 2024},
address={Miami, United States},
publisher={Association for Computational Linguistics}
}
AI studio
Generate PPTs instantly with Nano Banana Pro.
Generate PPT NowAccess Dataset
Please login to view download links and access full dataset details.
Topics
Source
Organization: huggingface
Created: 10/14/2024
Power Your Data Analysis with Premium AI Models
Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.
Enjoy a free trial and save 20%+ compared to official pricing.