Back to datasets
Dataset assetOpen Source CommunityPrecision MedicineBiomedical Knowledge Graph

ibm/otter_primekg

The Otter PrimeKG dataset contains 12,757,257 triples covering proteins, drugs, and diseases, and includes protein sequences, SMILES strings, and textual descriptions. Built on PrimeKG—a precision‑medicine knowledge graph integrating 20 biomedical resources—it describes 17,080 diseases and 4 million relations. PrimeKG includes nodes for 29,786 genes/proteins and 7,957 drugs. The multimodal knowledge graph (MKG) derived from PrimeKG comprises 13 modalities and 12,757,300 edges (154,130 data‑property edges and 12,603,170 object‑property edges), featuring 642,150 protein‑protein interaction edges, 25,653 drug‑protein interaction edges, and 2,672,628 drug‑drug interaction edges.

Source
hugging_face
Created
Nov 28, 2025
Updated
Jun 26, 2023
Signals
225 views
Availability
Linked source ready
Overview

Dataset description and usage context

Otter PrimeKG Dataset Overview

Dataset Description

  • Name: Otter PrimeKG
  • Content: 12,757,257 triples covering proteins, drugs, and diseases. Includes protein sequences, SMILES strings, and textual information.

Dataset Details

  • PrimeKG: Integrates 20 biomedical resources, describing 17,080 diseases and 4 million relations. Nodes include 29,786 genes/proteins and 7,957 drugs.
  • Multimodal Knowledge Graph (MKG): Built from PrimeKG, contains 13 modalities, 12,757,300 edges (154,130 data‑property edges and 12,603,170 object‑property edges), among which are 642,150 protein‑protein interaction edges, 25,653 drug‑protein interaction edges, and 2,672,628 drug‑drug interaction edges.

Original Dataset Information

License

  • Type: MIT

Related Models

Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio