OGB/ogbg-molhiv
`ogbg‑molhiv` is a small molecular property prediction dataset adapted from MoleculeNet by the Stanford team for the Open Graph Benchmark. It is a binary classification task predicting whether a molecule inhibits HIV, evaluated with ROC‑AUC. The dataset comprises 41,127 graphs, each with node features, edge indices, edge attributes, and labels, following the PyGeometric split.
Description
Dataset Overview
Dataset Name
ogbg‑molhiv
Dataset Summary
ogbg‑molhiv is a small molecular property prediction dataset adapted by the Stanford team from MoleculeNet for the Open Graph Benchmark.
Supported Tasks and Leaderboards
- Task Type: Molecular property prediction (binary classification: HIV inhibition).
- Evaluation Metric: ROC‑AUC.
- Leaderboards:
Dataset Structure
Data Attributes
- Scale: Small
- Number of Graphs: 41,127
- Average Nodes per Graph: 25.5
- Average Edges per Graph: 27.5
- Average Node Degree: 2.2
- Average Clustering Coefficient: 0.002
- Largest Strongly Connected Component Ratio: 0.993
- Graph Diameter: 12.0
Data Fields
node_feat(list: #nodes × #node‑features)edge_index(list: 2 × #edges)edge_attr(list: #edges × #edge‑features)y(list: 1 × #labels)num_nodes(integer)
Data Split
The dataset follows the PyGeometric split, which can be accessed as follows:
from ogb.graphproppred import PygGraphPropPredDataset
dataset = PygGraphPropPredDataset(name='ogbg-molhiv')
split_idx = dataset.get_idx_split()
train = dataset[split_idx['train']] # similarly for 'valid' and 'test'
Additional Information
License
Released under the MIT License.
Citation
@inproceedings{hu-etal-2020-open,
author = {Weihua Hu and
Matthias Fey and
Marinka Zitnik and
Yuxiao Dong and
Hongyu Ren and
Bowen Liu and
Michele Catasta and
Jure Leskovec},
editor = {Hugo Larochelle and
Marc Aurelio Ranzato and
Raia Hadsell and
Maria‑Florina Balcan and
Hsuan‑Tien Lin},
title = {Open Graph Benchmark: Datasets for Machine Learning on Graphs},
booktitle = {Advances in Neural Information Processing Systems 33: Annual Conference
on Neural Information Processing Systems 2020, NeurIPS 2020, December
6‑12, 2020, virtual},
year = {2020},
url = {https://proceedings.neurips.cc/paper/2020/hash/fb60d411a5c5b72b2e7d3527cfc84fd0-Abstract.html},
}
Contributors
Thanks to @clefourrier for adding this dataset.
AI studio
Generate PPTs instantly with Nano Banana Pro.
Generate PPT NowAccess Dataset
Please login to view download links and access full dataset details.
Topics
Source
Organization: hugging_face
Created: Unknown
Power Your Data Analysis with Premium AI Models
Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.
Enjoy a free trial and save 20%+ compared to official pricing.