DATASET
Open Source Community
bigbio/genia_term_corpus
The GENIA Term Corpus focuses on recognizing entities of interest in molecular biology such as proteins, genes, and cells, which is a fundamental task in biomedical text mining. The GENIA technical term annotations cover physical biological entities as well as other important terminology. The corpus annotates abstracts from the main GENIA corpus, totaling 1,999 abstracts.
Updated 12/22/2022
hugging_face
Description
GENIA Term Corpus Dataset Overview
Basic Information
- Language: English
- License: GENIA_PROJECT_LICENSE
- Multilinguality: Monolingual
- Dataset Name: GENIA Term Corpus
- Homepage: GENIA Term Corpus
Dataset Description
- Availability: Public
- Task: Named Entity Recognition (NER)
- Content: Contains recognition of entities of interest in molecular biology (e.g., proteins, genes, cells). The dataset covers 1,999 abstracts from the original GENIA corpus.
Citation Information
- Reference 1: Ohta, T., Tateisi, Y., & Kim, J.-D. (2002). The GENIA Corpus: An Annotated Research Abstract Corpus in Molecular Biology Domain. Proceedings of the Second International Conference on Human Language Technology Research, 82–86.
- Reference 2: Kim, J.-D., Ohta, T., Tateisi, Y., & Tsujii, J. (2003). GENIA corpus - a semantically annotated corpus for bio‑textmining. Bioinformatics, 19 Suppl 1, i180-2.
- Reference 3: Kim, J.-D., Ohta, T., Tsuruoka, Y., Tateisi, Y., & Collier, N. (2004). Introduction to the Bio‑Entity Recognition Task at JNLPBA. Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and Its Applications, 70–75.
AI studio
Generate PPTs instantly with Nano Banana Pro.
Generate PPT NowAccess Dataset
Login to Access
Please login to view download links and access full dataset details.
Topics
Bioinformatics
Text Mining
Source
Organization: hugging_face
Created: Unknown
Power Your Data Analysis with Premium AI Models
Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.
Enjoy a free trial and save 20%+ compared to official pricing.