Explore high-quality datasets for your AI and machine learning projects.
The Google DeepMind AlphaMissense database contains predictions for all possible single-nucleotide missense variants in human protein‑coding genes, covering both hg19 and hg38 genome builds. The dataset provides gene‑level average predictions, predictions for all possible single‑amino‑acid substitutions, and predictions for non‑canonical transcript isoforms. Each file includes chromosome, genomic position, reference and alternate nucleotides, UniProtKB identifier, transcript ID, protein variant, AlphaMissense pathogenicity score and its classification, among other fields. Use of the dataset is limited to the CC BY‑NC‑SA 4.0 license and only for non‑commercial research.