DFKI-SLT/conll04
The CoNLL04 dataset is a benchmark for relation extraction tasks, containing 1,437 sentences, each with at least one relation. Sentences are annotated with entities (e.g., `Peop`, `Loc`, `Org`, `Other`) and relation types (e.g., `Located_In`, `Work_For`, `OrgBased_In`, `Live_In`, `Kill`). The dataset is in English and formatted as JSONL.
Description
Dataset Overview
Dataset Name: CoNLL04
Purpose: Relation extraction task
Language: English
Size: 1,437 sentences, each containing at least one relation.
Data Structure
Fields
- tokens: Text content, string.
- entities: List of entities
- type: Entity type, string.
- start: Start index, integer.
- end: End index, integer.
- relations: List of relations
- type: Relation type, string.
- head: Head entity index, integer.
- tail: Tail entity index, integer.
Splits
- Training (train): 922 samples, 358 752 bytes.
- Validation (validation): 231 samples, 94 688 bytes.
- Test (test): 288 samples, 114 248 bytes.
Configuration
- Default:
- Train path: data/train-*
- Validation path: data/validation-*
- Test path: data/test-*
Citation
BibTeX:
@inproceedings{roth-yih-2004-linear,
title = "A Linear Programming Formulation for Global Inference in Natural Language Tasks",
author = "Roth, Dan and
Yih, Wen-tau",
booktitle = "Proceedings of the Eighth Conference on Computational Natural Language Learning (CoNLL-2004) at HLT-NAACL 2004",
month = may # " 6 - " # may # " 7",
year = "2004",
address = "Boston, Massachusetts, USA",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/W04-2401",
pages = "1--8",
}
@article{eberts-ulges2019spert,
author = {Markus Eberts and
Adrian Ulges},
title = {Span-based Joint Entity and Relation Extraction with Transformer Pre-training},
journal = {CoRR},
volume = {abs/1909.07755},
year = {2019},
url = {http://arxiv.org/abs/1909.07755},
eprinttype = {arXiv},
eprint = {1909.07755},
timestamp = {Mon, 23 Sep 2019 18:07:15 +0200},
biburl = {https://dblp.org/rec/journals/corr/abs-1909-07755.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}
APA:
- Roth, D., & Yih, W. (2004). A linear programming formulation for global inference in natural language tasks. In Proceedings of the Eighth Conference on Computational Natural Language Learning (CoNLL-2004) at HLT-NAACL 2004 (pp. 1‑8). Boston, MA, USA: Association for Computational Linguistics. https://aclanthology.org/W04-2401
- Eberts, M., & Ulges, A. (2019). Span‑based joint entity and relation extraction with transformer pre‑training. CoRR, abs/1909.07755. http://arxiv.org/abs/1909.07755
AI studio
Generate PPTs instantly with Nano Banana Pro.
Generate PPT NowAccess Dataset
Please login to view download links and access full dataset details.
Topics
Source
Organization: hugging_face
Created: Unknown
Power Your Data Analysis with Premium AI Models
Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.
Enjoy a free trial and save 20%+ compared to official pricing.