Explore high-quality datasets for your AI and machine learning projects.
The Otter PrimeKG dataset contains 12,757,257 triples covering proteins, drugs, and diseases, and includes protein sequences, SMILES strings, and textual descriptions. Built on PrimeKG—a precision‑medicine knowledge graph integrating 20 biomedical resources—it describes 17,080 diseases and 4 million relations. PrimeKG includes nodes for 29,786 genes/proteins and 7,957 drugs. The multimodal knowledge graph (MKG) derived from PrimeKG comprises 13 modalities and 12,757,300 edges (154,130 data‑property edges and 12,603,170 object‑property edges), featuring 642,150 protein‑protein interaction edges, 25,653 drug‑protein interaction edges, and 2,672,628 drug‑drug interaction edges.