JUHE API Marketplace
DATASET
Open Source Community

CHEN11, ASTEX, metapocket2 datasets, FPTRAIN, HOLO4K

CHEN11: 251 proteins with 476 ligands for LBS prediction benchmarks. ASTEX: Astex diverse dataset. metapocket2: includes U/B48 (48 proteins in bound and unbound states), DT198 (198 drug‑target complexes), B210 (210 bound‑state proteins). FPTRAIN: dataset for training Fpocket pocket‑scoring function. HOLO4K: large protein‑ligand complex set comprising large multi‑chain structures directly downloaded from PDB.

Updated 4/11/2024
github

Description

Dataset Overview

Main Protein Datasets

  1. CHEN11: Contains 251 proteins with a total of 476 ligands for LBS prediction benchmarking.
  2. ASTEX: Astex diverse collection.
  3. metapocket2 dataset series:
    • U/B48: 48 proteins in both bound and unbound states.
    • DT198: 198 drug‑target complexes.
    • B210: Benchmark dataset of 210 bound‑state proteins.
  4. FPTRAIN: Dataset used for training the Fpocket pocket‑scoring function.
  5. HOLO4K: Large protein‑ligand complex dataset containing multi‑chain structures, non‑overlapping with CHEN11 and JOINED.

Dataset Variants

  • "standard": Contains a single column of ligand‑bound proteins.
  • *(mlig)* dataset: Explicitly specifies associated ligands; ligand codes are sourced from the MOAD 2013 database.
  • Prediction‑included datasets: Contain predictions from other ligand‑binding site prediction methods.
  • *-XXsubset-* datasets: Subsets of the original datasets where a specific method succeeded and produced predictions.

Dataset Caveats

  • *.ds files may contain only a subset of the PDB files. For example, the holo4k/ directory holds 4,543 PDB files, but holo4k.ds lists 4,009 lines, which is the correct protein count used in the P2Rank/PrankWeb paper for the HOLO4K dataset.
  • 1xgf.pdb has been removed from the holo4k dataset (contains only UNK groups and no ligand).

AI studio

Generate PPTs instantly with Nano Banana Pro.

Generate PPT Now

Access Dataset

Login to Access

Please login to view download links and access full dataset details.

Topics

Bioinformatics
Drug Discovery

Source

Organization: github

Created: 5/18/2018

Power Your Data Analysis with Premium AI Models

Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.

Enjoy a free trial and save 20%+ compared to official pricing.