JUHE API Marketplace
DATASET
Open Source Community

maomlab/AqSolDB

AqSolDB, created by the Autonomous Materials Discovery (AMD) research group, contains aqueous solubility data for 9,982 unique compounds aggregated from nine publicly available soluble datasets. It is the largest publicly accessible dataset of its kind, serving both as a valuable reference for measured solubility and as improved, generalizable training data for data‑driven models. The dataset provides 2D descriptors for compounds, with standardized and validated molecular representations and reliability labels.

Updated 8/1/2025
hugging_face

Description

Aqueous Solubility Database (AqSolDB)

Dataset Overview

AqSolDB is a dataset containing solubility values for 9,982 distinct compounds, compiled from nine different publicly available aqueous solubility datasets.

Dataset Information

  • Language: English
  • License: MIT
  • Source: Curated
  • Task Category: Tabular Regression
  • Tags: Chemistry, Cheminformatics
  • Size Category: 1K < n < 10K
  • Config Name: AqSolDB

Data Files

  • Config: AqSolDB
    • Test Set:
      • Path: AqSolDB/test.csv
      • File Size: 578736 bytes
      • Samples: 2494
    • Training Set:
      • Path: AqSolDB/train.csv
      • File Size: 1737344 bytes
      • Samples: 7488

Features

  • ID: string
  • Name: string
  • InChI: string
  • InChIKey: string
  • SMILES: string
  • Solubility: float64
  • SD: float64
  • Ocurrences: int64
  • Group: string
  • MolWt: float64
  • MolLogP: float64
  • MolMR: float64
  • HeavyAtomCount: float64
  • NumHAcceptors: float64
  • NumHDonors: float64
  • NumHeteroatoms: float64
  • NumRotatableBonds: float64
  • NumValenceElectrons: float64
  • NumAromaticRings: float64
  • NumSaturatedRings: float64
  • NumAliphaticRings: float64
  • RingCount: float64
  • TPSA: float64
  • LabuteASA: float64
  • BalabanJ: float64
  • BertzCT: float64
  • ClusterNo: int64
  • MolCount: int64
  • group: string

Citation

@article{ author = {Murat Cihan Sorkun, Abhishek Khetan & Süleyman Er}, title = {AqSolDB, a curated reference set of aqueous solubility and 2D descriptors for a diverse set of compounds}, journal = {Scientific Data}, year = {2019}, volume = {6}, number = {143}, month = {aug}, url = {https://www.nature.com/articles/s41597-019-0151-1}, publisher = {Springer Nature} }

AI studio

Generate PPTs instantly with Nano Banana Pro.

Generate PPT Now

Access Dataset

Login to Access

Please login to view download links and access full dataset details.

Topics

Chemistry
Data Analysis

Source

Organization: hugging_face

Created: Unknown

Power Your Data Analysis with Premium AI Models

Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.

Enjoy a free trial and save 20%+ compared to official pricing.