Back to datasets
Dataset assetOpen Source CommunityText MiningBioinformatics

bigbio/genia_term_corpus

The GENIA Term Corpus focuses on recognizing entities of interest in molecular biology such as proteins, genes, and cells, which is a fundamental task in biomedical text mining. The GENIA technical term annotations cover physical biological entities as well as other important terminology. The corpus annotates abstracts from the main GENIA corpus, totaling 1,999 abstracts.

Source
hugging_face
Created
Nov 28, 2025
Updated
Dec 22, 2022
Signals
265 views
Availability
Linked source ready
Overview

Dataset description and usage context

GENIA Term Corpus Dataset Overview

Basic Information

  • Language: English
  • License: GENIA_PROJECT_LICENSE
  • Multilinguality: Monolingual
  • Dataset Name: GENIA Term Corpus
  • Homepage: GENIA Term Corpus

Dataset Description

  • Availability: Public
  • Task: Named Entity Recognition (NER)
  • Content: Contains recognition of entities of interest in molecular biology (e.g., proteins, genes, cells). The dataset covers 1,999 abstracts from the original GENIA corpus.

Citation Information

  • Reference 1: Ohta, T., Tateisi, Y., & Kim, J.-D. (2002). The GENIA Corpus: An Annotated Research Abstract Corpus in Molecular Biology Domain. Proceedings of the Second International Conference on Human Language Technology Research, 82–86.
  • Reference 2: Kim, J.-D., Ohta, T., Tateisi, Y., & Tsujii, J. (2003). GENIA corpus - a semantically annotated corpus for bio‑textmining. Bioinformatics, 19 Suppl 1, i180-2.
  • Reference 3: Kim, J.-D., Ohta, T., Tsuruoka, Y., Tateisi, Y., & Collier, N. (2004). Introduction to the Bio‑Entity Recognition Task at JNLPBA. Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and Its Applications, 70–75.
Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio