maomlab/AqSolDB
AqSolDB, created by the Autonomous Materials Discovery (AMD) research group, contains aqueous solubility data for 9,982 unique compounds aggregated from nine publicly available soluble datasets. It is the largest publicly accessible dataset of its kind, serving both as a valuable reference for measured solubility and as improved, generalizable training data for data‑driven models. The dataset provides 2D descriptors for compounds, with standardized and validated molecular representations and reliability labels.
Dataset description and usage context
Aqueous Solubility Database (AqSolDB)
Dataset Overview
AqSolDB is a dataset containing solubility values for 9,982 distinct compounds, compiled from nine different publicly available aqueous solubility datasets.
Dataset Information
- Language: English
- License: MIT
- Source: Curated
- Task Category: Tabular Regression
- Tags: Chemistry, Cheminformatics
- Size Category: 1K < n < 10K
- Config Name: AqSolDB
Data Files
- Config: AqSolDB
- Test Set:
- Path: AqSolDB/test.csv
- File Size: 578736 bytes
- Samples: 2494
- Training Set:
- Path: AqSolDB/train.csv
- File Size: 1737344 bytes
- Samples: 7488
- Test Set:
Features
- ID: string
- Name: string
- InChI: string
- InChIKey: string
- SMILES: string
- Solubility: float64
- SD: float64
- Ocurrences: int64
- Group: string
- MolWt: float64
- MolLogP: float64
- MolMR: float64
- HeavyAtomCount: float64
- NumHAcceptors: float64
- NumHDonors: float64
- NumHeteroatoms: float64
- NumRotatableBonds: float64
- NumValenceElectrons: float64
- NumAromaticRings: float64
- NumSaturatedRings: float64
- NumAliphaticRings: float64
- RingCount: float64
- TPSA: float64
- LabuteASA: float64
- BalabanJ: float64
- BertzCT: float64
- ClusterNo: int64
- MolCount: int64
- group: string
Citation
@article{ author = {Murat Cihan Sorkun, Abhishek Khetan & Süleyman Er}, title = {AqSolDB, a curated reference set of aqueous solubility and 2D descriptors for a diverse set of compounds}, journal = {Scientific Data}, year = {2019}, volume = {6}, number = {143}, month = {aug}, url = {https://www.nature.com/articles/s41597-019-0151-1}, publisher = {Springer Nature} }
Pair the dataset with AI analysis and content workflows.
Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.