Back to datasets
Dataset assetOpen Source CommunityMolecular Property PredictionGraph Neural Networks

OGB/ogbg-molhiv

`ogbg‑molhiv` is a small molecular property prediction dataset adapted from MoleculeNet by the Stanford team for the Open Graph Benchmark. It is a binary classification task predicting whether a molecule inhibits HIV, evaluated with ROC‑AUC. The dataset comprises 41,127 graphs, each with node features, edge indices, edge attributes, and labels, following the PyGeometric split.

Source
hugging_face
Created
Nov 28, 2025
Updated
Feb 7, 2023
Signals
419 views
Availability
Linked source ready
Overview

Dataset description and usage context

Dataset Overview

Dataset Name

ogbg‑molhiv

Dataset Summary

ogbg‑molhiv is a small molecular property prediction dataset adapted by the Stanford team from MoleculeNet for the Open Graph Benchmark.

Supported Tasks and Leaderboards

Dataset Structure

Data Attributes
  • Scale: Small
  • Number of Graphs: 41,127
  • Average Nodes per Graph: 25.5
  • Average Edges per Graph: 27.5
  • Average Node Degree: 2.2
  • Average Clustering Coefficient: 0.002
  • Largest Strongly Connected Component Ratio: 0.993
  • Graph Diameter: 12.0
Data Fields
  • node_feat (list: #nodes × #node‑features)
  • edge_index (list: 2 × #edges)
  • edge_attr (list: #edges × #edge‑features)
  • y (list: 1 × #labels)
  • num_nodes (integer)
Data Split

The dataset follows the PyGeometric split, which can be accessed as follows:

from ogb.graphproppred import PygGraphPropPredDataset

dataset = PygGraphPropPredDataset(name='ogbg-molhiv')
split_idx = dataset.get_idx_split()
train = dataset[split_idx['train']]  # similarly for 'valid' and 'test'

Additional Information

License

Released under the MIT License.

Citation
@inproceedings{hu-etal-2020-open,
  author    = {Weihua Hu and
               Matthias Fey and
               Marinka Zitnik and
               Yuxiao Dong and
               Hongyu Ren and
               Bowen Liu and
               Michele Catasta and
               Jure Leskovec},
  editor    = {Hugo Larochelle and
               Marc Aurelio Ranzato and
               Raia Hadsell and
               Maria‑Florina Balcan and
               Hsuan‑Tien Lin},
  title     = {Open Graph Benchmark: Datasets for Machine Learning on Graphs},
  booktitle = {Advances in Neural Information Processing Systems 33: Annual Conference
               on Neural Information Processing Systems 2020, NeurIPS 2020, December
               6‑12, 2020, virtual},
  year      = {2020},
  url       = {https://proceedings.neurips.cc/paper/2020/hash/fb60d411a5c5b72b2e7d3527cfc84fd0-Abstract.html},
}
Contributors

Thanks to @clefourrier for adding this dataset.

Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio