ibm/otter_primekg
The Otter PrimeKG dataset contains 12,757,257 triples covering proteins, drugs, and diseases, and includes protein sequences, SMILES strings, and textual descriptions. Built on PrimeKG—a precision‑medicine knowledge graph integrating 20 biomedical resources—it describes 17,080 diseases and 4 million relations. PrimeKG includes nodes for 29,786 genes/proteins and 7,957 drugs. The multimodal knowledge graph (MKG) derived from PrimeKG comprises 13 modalities and 12,757,300 edges (154,130 data‑property edges and 12,603,170 object‑property edges), featuring 642,150 protein‑protein interaction edges, 25,653 drug‑protein interaction edges, and 2,672,628 drug‑drug interaction edges.
Dataset description and usage context
Otter PrimeKG Dataset Overview
Dataset Description
- Name: Otter PrimeKG
- Content: 12,757,257 triples covering proteins, drugs, and diseases. Includes protein sequences, SMILES strings, and textual information.
Dataset Details
- PrimeKG: Integrates 20 biomedical resources, describing 17,080 diseases and 4 million relations. Nodes include 29,786 genes/proteins and 7,957 drugs.
- Multimodal Knowledge Graph (MKG): Built from PrimeKG, contains 13 modalities, 12,757,300 edges (154,130 data‑property edges and 12,603,170 object‑property edges), among which are 642,150 protein‑protein interaction edges, 25,653 drug‑protein interaction edges, and 2,672,628 drug‑drug interaction edges.
Original Dataset Information
- Source: GitHub Repository
- Citation: Chandak, P., Huang, K. & Zitnik, M. Building a knowledge graph to enable precision medicine. Sci Data 10, 67 (2023). https://doi.org/10.1038/s41597-023-01960-3
License
- Type: MIT
Related Models
- Classifier: ibm/otter_primekg_classifier
- DistMult: ibm/otter_primekg_distmult
- TransE: ibm/otter_primekg_transe
Pair the dataset with AI analysis and content workflows.
Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.