OGB/ogbg-molhiv
`ogbg‑molhiv` is a small molecular property prediction dataset adapted from MoleculeNet by the Stanford team for the Open Graph Benchmark. It is a binary classification task predicting whether a molecule inhibits HIV, evaluated with ROC‑AUC. The dataset comprises 41,127 graphs, each with node features, edge indices, edge attributes, and labels, following the PyGeometric split.
Dataset description and usage context
Dataset Overview
Dataset Name
ogbg‑molhiv
Dataset Summary
ogbg‑molhiv is a small molecular property prediction dataset adapted by the Stanford team from MoleculeNet for the Open Graph Benchmark.
Supported Tasks and Leaderboards
- Task Type: Molecular property prediction (binary classification: HIV inhibition).
- Evaluation Metric: ROC‑AUC.
- Leaderboards:
Dataset Structure
Data Attributes
- Scale: Small
- Number of Graphs: 41,127
- Average Nodes per Graph: 25.5
- Average Edges per Graph: 27.5
- Average Node Degree: 2.2
- Average Clustering Coefficient: 0.002
- Largest Strongly Connected Component Ratio: 0.993
- Graph Diameter: 12.0
Data Fields
node_feat(list: #nodes × #node‑features)edge_index(list: 2 × #edges)edge_attr(list: #edges × #edge‑features)y(list: 1 × #labels)num_nodes(integer)
Data Split
The dataset follows the PyGeometric split, which can be accessed as follows:
from ogb.graphproppred import PygGraphPropPredDataset
dataset = PygGraphPropPredDataset(name='ogbg-molhiv')
split_idx = dataset.get_idx_split()
train = dataset[split_idx['train']] # similarly for 'valid' and 'test'
Additional Information
License
Released under the MIT License.
Citation
@inproceedings{hu-etal-2020-open,
author = {Weihua Hu and
Matthias Fey and
Marinka Zitnik and
Yuxiao Dong and
Hongyu Ren and
Bowen Liu and
Michele Catasta and
Jure Leskovec},
editor = {Hugo Larochelle and
Marc Aurelio Ranzato and
Raia Hadsell and
Maria‑Florina Balcan and
Hsuan‑Tien Lin},
title = {Open Graph Benchmark: Datasets for Machine Learning on Graphs},
booktitle = {Advances in Neural Information Processing Systems 33: Annual Conference
on Neural Information Processing Systems 2020, NeurIPS 2020, December
6‑12, 2020, virtual},
year = {2020},
url = {https://proceedings.neurips.cc/paper/2020/hash/fb60d411a5c5b72b2e7d3527cfc84fd0-Abstract.html},
}
Contributors
Thanks to @clefourrier for adding this dataset.
Pair the dataset with AI analysis and content workflows.
Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.