Back to datasets
Dataset assetOpen Source CommunityProtein Thermal StabilityStructural Analysis
SaProtHub/Dataset-Thermostability-FLIP
Thermal stability prediction is a regression task where each input protein x is mapped to a label y representing its thermal stability. The dataset originates from the FLIP project, using Human‑cell split protein data and excluding proteins without AlphaFold2 structures. It is divided based on 70 % structural similarity into training, validation, and test sets containing 5,310, 706, and 706 samples respectively. The data are stored in LMDB format and include fields such as sample count, UniProt ID, structure‑aware sequence, and adaptive fitness labels.
Source
hugging_face
Created
Nov 28, 2025
Updated
Jul 10, 2024
Signals
297 views
Availability
Linked source ready
Overview
Dataset description and usage context
Dataset Overview
Dataset Description
- Task Type: Regression task
- Goal: Predict the thermal stability of proteins by mapping each input protein to a real‑valued label representing its stability.
Dataset Splits
- Source: From FLIP – Benchmark tasks in fitness landscape inference for proteins
- Structure Type: AlphaFold2
- Split Criterion: Based on 70 % structural similarity
- Split Details:
- Training set: 5,310 samples
- Validation set: 706 samples
- Test set: 706 samples
Data Format
- Storage Format: LMDB
- Database Structure:
- Length: Total number of samples
- Data Fields:
- name: UniProt ID of the protein
- seq: Structure‑aware sequence
- plddt: pLDDT values for all positions
- fitness: Adaptive fitness label for the sequence
Need downstream help?
Pair the dataset with AI analysis and content workflows.
Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.