Back to datasets
Dataset assetOpen Source CommunityRelation Extraction
DFKI-SLT/conll04
The CoNLL04 dataset is a benchmark for relation extraction tasks, containing 1,437 sentences, each with at least one relation. Sentences are annotated with entities (e.g., `Peop`, `Loc`, `Org`, `Other`) and relation types (e.g., `Located_In`, `Work_For`, `OrgBased_In`, `Live_In`, `Kill`). The dataset is in English and formatted as JSONL.
Source
hugging_face
Created
Nov 28, 2025
Updated
Jun 7, 2024
Signals
440 views
Availability
Linked source ready
Overview
Dataset description and usage context
Dataset Overview
Dataset Name: CoNLL04
Purpose: Relation extraction task
Language: English
Size: 1,437 sentences, each containing at least one relation.
Data Structure
Fields
- tokens: Text content, string.
- entities: List of entities
- type: Entity type, string.
- start: Start index, integer.
- end: End index, integer.
- relations: List of relations
- type: Relation type, string.
- head: Head entity index, integer.
- tail: Tail entity index, integer.
Splits
- Training (train): 922 samples, 358 752 bytes.
- Validation (validation): 231 samples, 94 688 bytes.
- Test (test): 288 samples, 114 248 bytes.
Configuration
- Default:
- Train path: data/train-*
- Validation path: data/validation-*
- Test path: data/test-*
Citation
BibTeX:
@inproceedings{roth-yih-2004-linear,
title = "A Linear Programming Formulation for Global Inference in Natural Language Tasks",
author = "Roth, Dan and
Yih, Wen-tau",
booktitle = "Proceedings of the Eighth Conference on Computational Natural Language Learning (CoNLL-2004) at HLT-NAACL 2004",
month = may # " 6 - " # may # " 7",
year = "2004",
address = "Boston, Massachusetts, USA",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/W04-2401",
pages = "1--8",
}
@article{eberts-ulges2019spert,
author = {Markus Eberts and
Adrian Ulges},
title = {Span-based Joint Entity and Relation Extraction with Transformer Pre-training},
journal = {CoRR},
volume = {abs/1909.07755},
year = {2019},
url = {http://arxiv.org/abs/1909.07755},
eprinttype = {arXiv},
eprint = {1909.07755},
timestamp = {Mon, 23 Sep 2019 18:07:15 +0200},
biburl = {https://dblp.org/rec/journals/corr/abs-1909-07755.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}
APA:
- Roth, D., & Yih, W. (2004). A linear programming formulation for global inference in natural language tasks. In Proceedings of the Eighth Conference on Computational Natural Language Learning (CoNLL-2004) at HLT-NAACL 2004 (pp. 1‑8). Boston, MA, USA: Association for Computational Linguistics. https://aclanthology.org/W04-2401
- Eberts, M., & Ulges, A. (2019). Span‑based joint entity and relation extraction with transformer pre‑training. CoRR, abs/1909.07755. http://arxiv.org/abs/1909.07755
Need downstream help?
Pair the dataset with AI analysis and content workflows.
Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.