Back to datasets
Dataset assetOpen Source CommunityChemistryData Analysis

maomlab/AqSolDB

AqSolDB, created by the Autonomous Materials Discovery (AMD) research group, contains aqueous solubility data for 9,982 unique compounds aggregated from nine publicly available soluble datasets. It is the largest publicly accessible dataset of its kind, serving both as a valuable reference for measured solubility and as improved, generalizable training data for data‑driven models. The dataset provides 2D descriptors for compounds, with standardized and validated molecular representations and reliability labels.

Source
hugging_face
Created
Nov 28, 2025
Updated
Aug 1, 2025
Signals
489 views
Availability
Linked source ready
Overview

Dataset description and usage context

Aqueous Solubility Database (AqSolDB)

Dataset Overview

AqSolDB is a dataset containing solubility values for 9,982 distinct compounds, compiled from nine different publicly available aqueous solubility datasets.

Dataset Information

  • Language: English
  • License: MIT
  • Source: Curated
  • Task Category: Tabular Regression
  • Tags: Chemistry, Cheminformatics
  • Size Category: 1K < n < 10K
  • Config Name: AqSolDB

Data Files

  • Config: AqSolDB
    • Test Set:
      • Path: AqSolDB/test.csv
      • File Size: 578736 bytes
      • Samples: 2494
    • Training Set:
      • Path: AqSolDB/train.csv
      • File Size: 1737344 bytes
      • Samples: 7488

Features

  • ID: string
  • Name: string
  • InChI: string
  • InChIKey: string
  • SMILES: string
  • Solubility: float64
  • SD: float64
  • Ocurrences: int64
  • Group: string
  • MolWt: float64
  • MolLogP: float64
  • MolMR: float64
  • HeavyAtomCount: float64
  • NumHAcceptors: float64
  • NumHDonors: float64
  • NumHeteroatoms: float64
  • NumRotatableBonds: float64
  • NumValenceElectrons: float64
  • NumAromaticRings: float64
  • NumSaturatedRings: float64
  • NumAliphaticRings: float64
  • RingCount: float64
  • TPSA: float64
  • LabuteASA: float64
  • BalabanJ: float64
  • BertzCT: float64
  • ClusterNo: int64
  • MolCount: int64
  • group: string

Citation

@article{ author = {Murat Cihan Sorkun, Abhishek Khetan & Süleyman Er}, title = {AqSolDB, a curated reference set of aqueous solubility and 2D descriptors for a diverse set of compounds}, journal = {Scientific Data}, year = {2019}, volume = {6}, number = {143}, month = {aug}, url = {https://www.nature.com/articles/s41597-019-0151-1}, publisher = {Springer Nature} }

Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio