Back to datasets
Dataset assetOpen Source CommunityBiotechnologyEnzyme Engineering

DATASET-CAPE-RhlA-seqlabel

The CAPE dataset contains mutation sequences of the RhlA enzyme and their functional evaluation metrics. The training set comprises 1,593 sequences, each paired with an enzyme activity metric for model training. The test set includes 925 sequences for model evaluation, where participants must predict the activity of these sequences. The goal is to optimize rhamnolipids production and application potential by engineering RhlA mutations.

Source
huggingface
Created
Nov 13, 2024
Updated
Nov 13, 2024
Signals
106 views
Availability
Linked source ready
Overview

Dataset description and usage context

CAPE Dataset: RhlA Enzyme Mutations

Dataset Introduction and Use Cases

RhlA (Uniprot ID: Q51559, PDB ID: 8IK2) is a key enzyme involved in synthesizing the hydrophobic component of rhamnolipids. It determines fatty‑acid chain length and unsaturation, influencing the physicochemical properties and bioactivity of rhamnolipids.

Why Engineer RhlA?

Modifying RhlA enables precise control over fatty‑acid chain structure, thereby increasing rhamnolipid yield and enhancing its industrial and pharmaceutical applicability.

Dataset Description

Training Set: Saprot_CAPE_dataset_train.csv

  • File format: CSV
  • Number of sequences: 1,593
  • Columns:
    • protein: Represents the mutation combination at six critical residues (positions 74, 101, 143, 148, 173, 176).
    • label: Enzyme activity metric indicating overall productivity.

Test Set: Saprot_CAPE_dataset_test.csv

  • Number of sequences: 925
  • Description: Contains only the sequence information. Participants must predict the activity of these sequences for model evaluation. Predictions are submitted to Kaggle for performance feedback.
Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio