DATASET-CAPE-RhlA-seqlabel
The CAPE dataset contains mutation sequences of the RhlA enzyme and their functional evaluation metrics. The training set comprises 1,593 sequences, each paired with an enzyme activity metric for model training. The test set includes 925 sequences for model evaluation, where participants must predict the activity of these sequences. The goal is to optimize rhamnolipids production and application potential by engineering RhlA mutations.
Description
CAPE Dataset: RhlA Enzyme Mutations
Dataset Introduction and Use Cases
RhlA (Uniprot ID: Q51559, PDB ID: 8IK2) is a key enzyme involved in synthesizing the hydrophobic component of rhamnolipids. It determines fatty‑acid chain length and unsaturation, influencing the physicochemical properties and bioactivity of rhamnolipids.
Why Engineer RhlA?
Modifying RhlA enables precise control over fatty‑acid chain structure, thereby increasing rhamnolipid yield and enhancing its industrial and pharmaceutical applicability.
Dataset Description
Training Set: Saprot_CAPE_dataset_train.csv
- File format: CSV
- Number of sequences: 1,593
- Columns:
- protein: Represents the mutation combination at six critical residues (positions 74, 101, 143, 148, 173, 176).
- label: Enzyme activity metric indicating overall productivity.
Test Set: Saprot_CAPE_dataset_test.csv
- Number of sequences: 925
- Description: Contains only the sequence information. Participants must predict the activity of these sequences for model evaluation. Predictions are submitted to Kaggle for performance feedback.
AI studio
Generate PPTs instantly with Nano Banana Pro.
Generate PPT NowAccess Dataset
Please login to view download links and access full dataset details.
Topics
Source
Organization: huggingface
Created: 11/13/2024
Power Your Data Analysis with Premium AI Models
Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.
Enjoy a free trial and save 20%+ compared to official pricing.