DATASET
Open Source Community
CHEN11, ASTEX, metapocket2 datasets, FPTRAIN, HOLO4K
CHEN11: 251 proteins with 476 ligands for LBS prediction benchmarks. ASTEX: Astex diverse dataset. metapocket2: includes U/B48 (48 proteins in bound and unbound states), DT198 (198 drug‑target complexes), B210 (210 bound‑state proteins). FPTRAIN: dataset for training Fpocket pocket‑scoring function. HOLO4K: large protein‑ligand complex set comprising large multi‑chain structures directly downloaded from PDB.
Updated 4/11/2024
github
Description
Dataset Overview
Main Protein Datasets
- CHEN11: Contains 251 proteins with a total of 476 ligands for LBS prediction benchmarking.
- ASTEX: Astex diverse collection.
- metapocket2 dataset series:
- U/B48: 48 proteins in both bound and unbound states.
- DT198: 198 drug‑target complexes.
- B210: Benchmark dataset of 210 bound‑state proteins.
- FPTRAIN: Dataset used for training the Fpocket pocket‑scoring function.
- HOLO4K: Large protein‑ligand complex dataset containing multi‑chain structures, non‑overlapping with CHEN11 and JOINED.
Dataset Variants
- "standard": Contains a single column of ligand‑bound proteins.
*(mlig)*dataset: Explicitly specifies associated ligands; ligand codes are sourced from the MOAD 2013 database.- Prediction‑included datasets: Contain predictions from other ligand‑binding site prediction methods.
*-XXsubset-*datasets: Subsets of the original datasets where a specific method succeeded and produced predictions.
Dataset Caveats
*.dsfiles may contain only a subset of the PDB files. For example, theholo4k/directory holds 4,543 PDB files, butholo4k.dslists 4,009 lines, which is the correct protein count used in the P2Rank/PrankWeb paper for the HOLO4K dataset.1xgf.pdbhas been removed from the holo4k dataset (contains only UNK groups and no ligand).
AI studio
Generate PPTs instantly with Nano Banana Pro.
Generate PPT NowAccess Dataset
Login to Access
Please login to view download links and access full dataset details.
Topics
Bioinformatics
Drug Discovery
Source
Organization: github
Created: 5/18/2018
Power Your Data Analysis with Premium AI Models
Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.
Enjoy a free trial and save 20%+ compared to official pricing.