JUHE API Marketplace
DATASET
Open Source Community

katielink/dm_alphamissense

The Google DeepMind AlphaMissense database contains predictions for all possible single-nucleotide missense variants in human protein‑coding genes, covering both hg19 and hg38 genome builds. The dataset provides gene‑level average predictions, predictions for all possible single‑amino‑acid substitutions, and predictions for non‑canonical transcript isoforms. Each file includes chromosome, genomic position, reference and alternate nucleotides, UniProtKB identifier, transcript ID, protein variant, AlphaMissense pathogenicity score and its classification, among other fields. Use of the dataset is limited to the CC BY‑NC‑SA 4.0 license and only for non‑commercial research.

Updated 10/5/2023
hugging_face

Description

Google DeepMind AlphaMissense Database

File Descriptions

  • AlphaMissense_hg19.tsv.gz, AlphaMissense_hg38.tsv.gz: Predictions for all possible single‑nucleotide missense variants (71 M) from 19 k human protein‑coding genes (canonical transcripts), available for hg19 and hg38 coordinates. Files are sorted by genomic coordinate.
  • AlphaMissense_gene_hg19.tsv.gz, AlphaMissense_gene_hg38.tsv.gz: Gene‑level average predictions, computed as the mean of alphamissense_pathogenicity across all possible missense variants in canonical transcripts.
  • AlphaMissense_aa_substitutions.tsv.gz: Predictions for all possible single‑amino‑acid substitutions covering 20 k UniProt canonical isoforms (216 M protein variants). This is a superset of amino‑acid changes caused by single‑nucleotide missense variants. The file uses UniProt accessions and contains no genomic coordinates.
  • AlphaMissense_isoforms_hg38.tsv.gz: Predictions for all possible missense variants in 60 k non‑canonical transcript isoforms (hg38, GENCODE V32). The file includes transcript_id but no UniProt accession. Predictions for non‑canonical isoforms are less thoroughly evaluated and should be used with caution. Sorted by genomic coordinate.
  • AlphaMissense_isoforms_aa_substitutions.tsv.gz: Predictions for all possible single‑amino‑acid substitutions in 60 k non‑canonical transcript isoforms (GENCODE V32). This is a superset of amino‑acid changes caused by single‑nucleotide missense variants. The file includes transcript_id but no UniProt accession.

All transcript annotations are based on GENCODE V27 (hg19) or V32 (hg38).

Column Descriptions

  • CHROM: Chromosome, formatted as chr<N> where N ∈ [1‑22, X, Y, M].
  • POS: 1‑based genomic coordinate.
  • REF: Reference nucleotide (GRCh38.p13 for hg38, GRCh37.p13 for hg19).
  • ALT: Alternate nucleotide.
  • genome: Genome build, hg38 or hg19.
  • uniprot_id: UniProtKB accession of the protein where the variant induces an amino‑acid substitution (UniProt release 2021_02).
  • transcript_id: Ensembl transcript ID from GENCODE V27 (hg19) or V32 (hg38).
  • protein_variant: Amino‑acid change caused by the alternate allele, formatted as <reference_aa><POS_aa><alternate_aa> (e.g., V2L). POS_aa is 1‑based position in the protein sequence.
  • am_pathogenicity: Calibrated AlphaMissense pathogenicity score (range 0‑1), interpretable as the probability that the variant is clinically pathogenic.
  • am_class: Discrete class of protein_variant: likely_benign, likely_pathogenic, or ambiguous. Thresholds: likely_benign if alphamissense_pathogenicity < 0.34; likely_pathogenic if > 0.564; otherwise ambiguous.
  • mean_am_pathogenicity: Mean of alphamissense_pathogenicity across all missense variants for a given transcript.

License / Disclaimer

AlphaMissense database copyright (2023) DeepMind Technologies Limited. All predictions are for non‑commercial research use and follow the CC BY‑NC‑SA license.

Researchers interested in predictions not yet provided, for non‑commercial use, may send an expression of interest to alphamissense@google.com.

The AlphaMissense database and related information are provided “as‑is”, without any express or implied warranties. The information is not a substitute for professional medical advice, diagnosis, or treatment.

Citation

If you use this resource, please cite: “Accurate proteome‑wide missense variant effect prediction with AlphaMissense” Jun Cheng, Guido Novati, Joshua Pan, Clare Bycroft, Akvilė Žemgulytė, Taylor Applebaum, Alexander Pritzel, Lai Hong Wong, Michal Zielinski, Tobias Sargeant, Rosalia G. Schneider, Andrew W. Senior, John Jumper, Demis Hassabis, Pushmeet Kohli, Žiga Avsec

Use of the AlphaMissense database is subject to the Google Cloud Platform Terms of Service.

AI studio

Generate PPTs instantly with Nano Banana Pro.

Generate PPT Now

Access Dataset

Login to Access

Please login to view download links and access full dataset details.

Topics

Gene Variant Prediction
Bioinformatics

Source

Organization: hugging_face

Created: Unknown

Power Your Data Analysis with Premium AI Models

Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.

Enjoy a free trial and save 20%+ compared to official pricing.