OGB/ogbg-molhiv

Dataset Overview

Dataset Name

ogbg‑molhiv

Dataset Summary

ogbg‑molhiv is a small molecular property prediction dataset adapted by the Stanford team from MoleculeNet for the Open Graph Benchmark.

Supported Tasks and Leaderboards

Task Type: Molecular property prediction (binary classification: HIV inhibition).
Evaluation Metric: ROC‑AUC.
Leaderboards:
- OGB leaderboard
- Papers with Code leaderboard

Dataset Structure

Data Attributes

Scale: Small
Number of Graphs: 41,127
Average Nodes per Graph: 25.5
Average Edges per Graph: 27.5
Average Node Degree: 2.2
Average Clustering Coefficient: 0.002
Largest Strongly Connected Component Ratio: 0.993
Graph Diameter: 12.0

Data Fields

node_feat (list: #nodes × #node‑features)
edge_index (list: 2 × #edges)
edge_attr (list: #edges × #edge‑features)
y (list: 1 × #labels)
num_nodes (integer)

Data Split

The dataset follows the PyGeometric split, which can be accessed as follows:

from ogb.graphproppred import PygGraphPropPredDataset

dataset = PygGraphPropPredDataset(name='ogbg-molhiv')
split_idx = dataset.get_idx_split()
train = dataset[split_idx['train']]  # similarly for 'valid' and 'test'

Additional Information

License

Released under the MIT License.

Citation

@inproceedings{hu-etal-2020-open,
  author    = {Weihua Hu and
               Matthias Fey and
               Marinka Zitnik and
               Yuxiao Dong and
               Hongyu Ren and
               Bowen Liu and
               Michele Catasta and
               Jure Leskovec},
  editor    = {Hugo Larochelle and
               Marc Aurelio Ranzato and
               Raia Hadsell and
               Maria‑Florina Balcan and
               Hsuan‑Tien Lin},
  title     = {Open Graph Benchmark: Datasets for Machine Learning on Graphs},
  booktitle = {Advances in Neural Information Processing Systems 33: Annual Conference
               on Neural Information Processing Systems 2020, NeurIPS 2020, December
               6‑12, 2020, virtual},
  year      = {2020},
  url       = {https://proceedings.neurips.cc/paper/2020/hash/fb60d411a5c5b72b2e7d3527cfc84fd0-Abstract.html},
}

Contributors

Thanks to @clefourrier for adding this dataset.

Description