katielink/dm_alphamissense
The Google DeepMind AlphaMissense database contains predictions for all possible single-nucleotide missense variants in human protein‑coding genes, covering both hg19 and hg38 genome builds. The dataset provides gene‑level average predictions, predictions for all possible single‑amino‑acid substitutions, and predictions for non‑canonical transcript isoforms. Each file includes chromosome, genomic position, reference and alternate nucleotides, UniProtKB identifier, transcript ID, protein variant, AlphaMissense pathogenicity score and its classification, among other fields. Use of the dataset is limited to the CC BY‑NC‑SA 4.0 license and only for non‑commercial research.
Description
Google DeepMind AlphaMissense Database
File Descriptions
- AlphaMissense_hg19.tsv.gz, AlphaMissense_hg38.tsv.gz: Predictions for all possible single‑nucleotide missense variants (71 M) from 19 k human protein‑coding genes (canonical transcripts), available for hg19 and hg38 coordinates. Files are sorted by genomic coordinate.
- AlphaMissense_gene_hg19.tsv.gz, AlphaMissense_gene_hg38.tsv.gz: Gene‑level average predictions, computed as the mean of
alphamissense_pathogenicityacross all possible missense variants in canonical transcripts. - AlphaMissense_aa_substitutions.tsv.gz: Predictions for all possible single‑amino‑acid substitutions covering 20 k UniProt canonical isoforms (216 M protein variants). This is a superset of amino‑acid changes caused by single‑nucleotide missense variants. The file uses UniProt accessions and contains no genomic coordinates.
- AlphaMissense_isoforms_hg38.tsv.gz: Predictions for all possible missense variants in 60 k non‑canonical transcript isoforms (hg38, GENCODE V32). The file includes
transcript_idbut no UniProt accession. Predictions for non‑canonical isoforms are less thoroughly evaluated and should be used with caution. Sorted by genomic coordinate. - AlphaMissense_isoforms_aa_substitutions.tsv.gz: Predictions for all possible single‑amino‑acid substitutions in 60 k non‑canonical transcript isoforms (GENCODE V32). This is a superset of amino‑acid changes caused by single‑nucleotide missense variants. The file includes
transcript_idbut no UniProt accession.
All transcript annotations are based on GENCODE V27 (hg19) or V32 (hg38).
Column Descriptions
- CHROM: Chromosome, formatted as
chr<N>where N ∈ [1‑22, X, Y, M]. - POS: 1‑based genomic coordinate.
- REF: Reference nucleotide (GRCh38.p13 for hg38, GRCh37.p13 for hg19).
- ALT: Alternate nucleotide.
- genome: Genome build, hg38 or hg19.
- uniprot_id: UniProtKB accession of the protein where the variant induces an amino‑acid substitution (UniProt release 2021_02).
- transcript_id: Ensembl transcript ID from GENCODE V27 (hg19) or V32 (hg38).
- protein_variant: Amino‑acid change caused by the alternate allele, formatted as
<reference_aa><POS_aa><alternate_aa>(e.g., V2L).POS_aais 1‑based position in the protein sequence. - am_pathogenicity: Calibrated AlphaMissense pathogenicity score (range 0‑1), interpretable as the probability that the variant is clinically pathogenic.
- am_class: Discrete class of
protein_variant:likely_benign,likely_pathogenic, orambiguous. Thresholds:likely_benignifalphamissense_pathogenicity< 0.34;likely_pathogenicif > 0.564; otherwiseambiguous. - mean_am_pathogenicity: Mean of
alphamissense_pathogenicityacross all missense variants for a given transcript.
License / Disclaimer
AlphaMissense database copyright (2023) DeepMind Technologies Limited. All predictions are for non‑commercial research use and follow the CC BY‑NC‑SA license.
Researchers interested in predictions not yet provided, for non‑commercial use, may send an expression of interest to alphamissense@google.com.
The AlphaMissense database and related information are provided “as‑is”, without any express or implied warranties. The information is not a substitute for professional medical advice, diagnosis, or treatment.
Citation
If you use this resource, please cite: “Accurate proteome‑wide missense variant effect prediction with AlphaMissense” Jun Cheng, Guido Novati, Joshua Pan, Clare Bycroft, Akvilė Žemgulytė, Taylor Applebaum, Alexander Pritzel, Lai Hong Wong, Michal Zielinski, Tobias Sargeant, Rosalia G. Schneider, Andrew W. Senior, John Jumper, Demis Hassabis, Pushmeet Kohli, Žiga Avsec
Use of the AlphaMissense database is subject to the Google Cloud Platform Terms of Service.
AI studio
Generate PPTs instantly with Nano Banana Pro.
Generate PPT NowAccess Dataset
Please login to view download links and access full dataset details.
Topics
Source
Organization: hugging_face
Created: Unknown
Power Your Data Analysis with Premium AI Models
Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.
Enjoy a free trial and save 20%+ compared to official pricing.