Back to datasets
Dataset assetOpen Source CommunityMachine LearningChemistry

scikit-fingerprints/MoleculeNet_ESOL

The MoleculeNet ESOL dataset is part of the MoleculeNet benchmark for predicting aqueous solubility. The target values are log‑transformed, expressed as log mol/L. The dataset contains 1,128 samples; scaffold split is recommended; evaluation metric is RMSE.

Source
hugging_face
Created
Nov 28, 2025
Updated
Jul 18, 2024
Signals
414 views
Availability
Linked source ready
Overview

Dataset description and usage context

MoleculeNet ESOL Dataset Overview

Basic Information

  • Dataset Name: MoleculeNet ESOL
  • Task Types:
    • Tabular regression
    • Graph machine learning
    • Tabular classification
  • Tags:
    • Chemistry
    • Biology
    • Medicine
  • Size: 1K < n < 10K
  • Configuration:
    • Config name: default
    • Data files:
      • Split: train
      • Path: "esol.csv"

Task Description

  • Task: Predict aqueous solubility
  • Target: Log‑transformed solubility, unit log mol per litre (log Mol/L)

Dataset Features

  • Number of Tasks: 1
  • Task Type: Regression
  • Total Samples: 1,128
  • Recommended Split: scaffold
  • Recommended Metric: RMSE

References

  1. John S. Delaney, "ESOL: Estimating Aqueous Solubility Directly from Molecular Structure", J. Chem. Inf. Comput. Sci. 2004, 44, 3, 1000–1005
  2. Wu, Zhenqin, et al., "MoleculeNet: a benchmark for molecular machine learning", Chemical Science 9.2 (2018): 513‑530
Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio