maomlab/AqSolDB
AqSolDB, created by the Autonomous Materials Discovery (AMD) research group, contains aqueous solubility data for 9,982 unique compounds aggregated from nine publicly available soluble datasets. It is the largest publicly accessible dataset of its kind, serving both as a valuable reference for measured solubility and as improved, generalizable training data for data‑driven models. The dataset provides 2D descriptors for compounds, with standardized and validated molecular representations and reliability labels.
Description
Aqueous Solubility Database (AqSolDB)
Dataset Overview
AqSolDB is a dataset containing solubility values for 9,982 distinct compounds, compiled from nine different publicly available aqueous solubility datasets.
Dataset Information
- Language: English
- License: MIT
- Source: Curated
- Task Category: Tabular Regression
- Tags: Chemistry, Cheminformatics
- Size Category: 1K < n < 10K
- Config Name: AqSolDB
Data Files
- Config: AqSolDB
- Test Set:
- Path: AqSolDB/test.csv
- File Size: 578736 bytes
- Samples: 2494
- Training Set:
- Path: AqSolDB/train.csv
- File Size: 1737344 bytes
- Samples: 7488
- Test Set:
Features
- ID: string
- Name: string
- InChI: string
- InChIKey: string
- SMILES: string
- Solubility: float64
- SD: float64
- Ocurrences: int64
- Group: string
- MolWt: float64
- MolLogP: float64
- MolMR: float64
- HeavyAtomCount: float64
- NumHAcceptors: float64
- NumHDonors: float64
- NumHeteroatoms: float64
- NumRotatableBonds: float64
- NumValenceElectrons: float64
- NumAromaticRings: float64
- NumSaturatedRings: float64
- NumAliphaticRings: float64
- RingCount: float64
- TPSA: float64
- LabuteASA: float64
- BalabanJ: float64
- BertzCT: float64
- ClusterNo: int64
- MolCount: int64
- group: string
Citation
@article{ author = {Murat Cihan Sorkun, Abhishek Khetan & Süleyman Er}, title = {AqSolDB, a curated reference set of aqueous solubility and 2D descriptors for a diverse set of compounds}, journal = {Scientific Data}, year = {2019}, volume = {6}, number = {143}, month = {aug}, url = {https://www.nature.com/articles/s41597-019-0151-1}, publisher = {Springer Nature} }
AI studio
Generate PPTs instantly with Nano Banana Pro.
Generate PPT NowAccess Dataset
Please login to view download links and access full dataset details.
Topics
Source
Organization: hugging_face
Created: Unknown
Power Your Data Analysis with Premium AI Models
Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.
Enjoy a free trial and save 20%+ compared to official pricing.