Explore high-quality datasets for your AI and machine learning projects.
This project aims to generate a rich dataset from the PepBDB database for machine‑learning and computational‑biology research. The dataset processes peptide‑protein interaction data, extracts sequences, and adds various biochemical features, creating a tabular dataset suitable for Random Forest, XGBoost, and other analyses. Each row is labeled as binding residue (1) or non‑binding residue (0).