JUHE API Marketplace
DATASET
Open Source Community

DFKI-SLT/conll04

The CoNLL04 dataset is a benchmark for relation extraction tasks, containing 1,437 sentences, each with at least one relation. Sentences are annotated with entities (e.g., `Peop`, `Loc`, `Org`, `Other`) and relation types (e.g., `Located_In`, `Work_For`, `OrgBased_In`, `Live_In`, `Kill`). The dataset is in English and formatted as JSONL.

Updated 6/7/2024
hugging_face

Description

Dataset Overview

Dataset Name: CoNLL04

Purpose: Relation extraction task

Language: English

Size: 1,437 sentences, each containing at least one relation.

Data Structure

Fields

  • tokens: Text content, string.
  • entities: List of entities
    • type: Entity type, string.
    • start: Start index, integer.
    • end: End index, integer.
  • relations: List of relations
    • type: Relation type, string.
    • head: Head entity index, integer.
    • tail: Tail entity index, integer.

Splits

  • Training (train): 922 samples, 358 752 bytes.
  • Validation (validation): 231 samples, 94 688 bytes.
  • Test (test): 288 samples, 114 248 bytes.

Configuration

  • Default:
    • Train path: data/train-*
    • Validation path: data/validation-*
    • Test path: data/test-*

Citation

BibTeX:

@inproceedings{roth-yih-2004-linear,
    title = "A Linear Programming Formulation for Global Inference in Natural Language Tasks",
    author = "Roth, Dan  and
      Yih, Wen-tau",
    booktitle = "Proceedings of the Eighth Conference on Computational Natural Language Learning (CoNLL-2004) at HLT-NAACL 2004",
    month = may # " 6 - " # may # " 7",
    year = "2004",
    address = "Boston, Massachusetts, USA",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/W04-2401",
    pages = "1--8",
}
@article{eberts-ulges2019spert,
  author       = {Markus Eberts and
                  Adrian Ulges},
  title        = {Span-based Joint Entity and Relation Extraction with Transformer Pre-training},
  journal      = {CoRR},
  volume       = {abs/1909.07755},
  year         = {2019},
  url          = {http://arxiv.org/abs/1909.07755},
  eprinttype    = {arXiv},
  eprint       = {1909.07755},
  timestamp    = {Mon, 23 Sep 2019 18:07:15 +0200},
  biburl       = {https://dblp.org/rec/journals/corr/abs-1909-07755.bib},
  bibsource    = {dblp computer science bibliography, https://dblp.org}
}

APA:

  • Roth, D., & Yih, W. (2004). A linear programming formulation for global inference in natural language tasks. In Proceedings of the Eighth Conference on Computational Natural Language Learning (CoNLL-2004) at HLT-NAACL 2004 (pp. 1‑8). Boston, MA, USA: Association for Computational Linguistics. https://aclanthology.org/W04-2401
  • Eberts, M., & Ulges, A. (2019). Span‑based joint entity and relation extraction with transformer pre‑training. CoRR, abs/1909.07755. http://arxiv.org/abs/1909.07755

AI studio

Generate PPTs instantly with Nano Banana Pro.

Generate PPT Now

Access Dataset

Login to Access

Please login to view download links and access full dataset details.

Topics

Relation Extraction

Source

Organization: hugging_face

Created: Unknown

Power Your Data Analysis with Premium AI Models

Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.

Enjoy a free trial and save 20%+ compared to official pricing.