Dataset assetOpen Source CommunityBiotechnologyEnzyme Engineering

DATASET-CAPE-RhlA-seqlabel

The CAPE dataset contains mutation sequences of the RhlA enzyme and their functional evaluation metrics. The training set comprises 1,593 sequences, each paired with an enzyme activity metric for model training. The test set includes 925 sequences for model evaluation, where participants must predict the activity of these sequences. The goal is to optimize rhamnolipids production and application potential by engineering RhlA mutations.

Source

huggingface

Created

Nov 13, 2024

Updated

Nov 13, 2024

Signals

106 views

Availability

Linked source ready

Overview

Dataset description and usage context

CAPE Dataset: RhlA Enzyme Mutations

Dataset Introduction and Use Cases

RhlA (Uniprot ID: Q51559, PDB ID: 8IK2) is a key enzyme involved in synthesizing the hydrophobic component of rhamnolipids. It determines fatty‑acid chain length and unsaturation, influencing the physicochemical properties and bioactivity of rhamnolipids.

Why Engineer RhlA?

Modifying RhlA enables precise control over fatty‑acid chain structure, thereby increasing rhamnolipid yield and enhancing its industrial and pharmaceutical applicability.

Dataset Description

Training Set: `Saprot_CAPE_dataset_train.csv`

File format: CSV
Number of sequences: 1,593
Columns:
- protein: Represents the mutation combination at six critical residues (positions 74, 101, 143, 148, 173, 176).
- label: Enzyme activity metric indicating overall productivity.

Test Set: `Saprot_CAPE_dataset_test.csv`

Number of sequences: 925
Description: Contains only the sequence information. Participants must predict the activity of these sequences for model evaluation. Predictions are submitted to Kaggle for performance feedback.

Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio