Back to datasets
Dataset assetOpen Source CommunityMachine LearningChemistry
scikit-fingerprints/MoleculeNet_ESOL
The MoleculeNet ESOL dataset is part of the MoleculeNet benchmark for predicting aqueous solubility. The target values are log‑transformed, expressed as log mol/L. The dataset contains 1,128 samples; scaffold split is recommended; evaluation metric is RMSE.
Source
hugging_face
Created
Nov 28, 2025
Updated
Jul 18, 2024
Signals
414 views
Availability
Linked source ready
Overview
Dataset description and usage context
MoleculeNet ESOL Dataset Overview
Basic Information
- Dataset Name: MoleculeNet ESOL
- Task Types:
- Tabular regression
- Graph machine learning
- Tabular classification
- Tags:
- Chemistry
- Biology
- Medicine
- Size: 1K < n < 10K
- Configuration:
- Config name: default
- Data files:
- Split: train
- Path: "esol.csv"
Task Description
- Task: Predict aqueous solubility
- Target: Log‑transformed solubility, unit log mol per litre (log Mol/L)
Dataset Features
- Number of Tasks: 1
- Task Type: Regression
- Total Samples: 1,128
- Recommended Split: scaffold
- Recommended Metric: RMSE
References
- John S. Delaney, "ESOL: Estimating Aqueous Solubility Directly from Molecular Structure", J. Chem. Inf. Comput. Sci. 2004, 44, 3, 1000–1005
- Wu, Zhenqin, et al., "MoleculeNet: a benchmark for molecular machine learning", Chemical Science 9.2 (2018): 513‑530
Need downstream help?
Pair the dataset with AI analysis and content workflows.
Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.