JUHE API Marketplace
DATASET
Open Source Community

OGB/ogbg-molhiv

`ogbg‑molhiv` is a small molecular property prediction dataset adapted from MoleculeNet by the Stanford team for the Open Graph Benchmark. It is a binary classification task predicting whether a molecule inhibits HIV, evaluated with ROC‑AUC. The dataset comprises 41,127 graphs, each with node features, edge indices, edge attributes, and labels, following the PyGeometric split.

Updated 2/7/2023
hugging_face

Description

Dataset Overview

Dataset Name

ogbg‑molhiv

Dataset Summary

ogbg‑molhiv is a small molecular property prediction dataset adapted by the Stanford team from MoleculeNet for the Open Graph Benchmark.

Supported Tasks and Leaderboards

Dataset Structure

Data Attributes
  • Scale: Small
  • Number of Graphs: 41,127
  • Average Nodes per Graph: 25.5
  • Average Edges per Graph: 27.5
  • Average Node Degree: 2.2
  • Average Clustering Coefficient: 0.002
  • Largest Strongly Connected Component Ratio: 0.993
  • Graph Diameter: 12.0
Data Fields
  • node_feat (list: #nodes × #node‑features)
  • edge_index (list: 2 × #edges)
  • edge_attr (list: #edges × #edge‑features)
  • y (list: 1 × #labels)
  • num_nodes (integer)
Data Split

The dataset follows the PyGeometric split, which can be accessed as follows:

from ogb.graphproppred import PygGraphPropPredDataset

dataset = PygGraphPropPredDataset(name='ogbg-molhiv')
split_idx = dataset.get_idx_split()
train = dataset[split_idx['train']]  # similarly for 'valid' and 'test'

Additional Information

License

Released under the MIT License.

Citation
@inproceedings{hu-etal-2020-open,
  author    = {Weihua Hu and
               Matthias Fey and
               Marinka Zitnik and
               Yuxiao Dong and
               Hongyu Ren and
               Bowen Liu and
               Michele Catasta and
               Jure Leskovec},
  editor    = {Hugo Larochelle and
               Marc Aurelio Ranzato and
               Raia Hadsell and
               Maria‑Florina Balcan and
               Hsuan‑Tien Lin},
  title     = {Open Graph Benchmark: Datasets for Machine Learning on Graphs},
  booktitle = {Advances in Neural Information Processing Systems 33: Annual Conference
               on Neural Information Processing Systems 2020, NeurIPS 2020, December
               6‑12, 2020, virtual},
  year      = {2020},
  url       = {https://proceedings.neurips.cc/paper/2020/hash/fb60d411a5c5b72b2e7d3527cfc84fd0-Abstract.html},
}
Contributors

Thanks to @clefourrier for adding this dataset.

AI studio

Generate PPTs instantly with Nano Banana Pro.

Generate PPT Now

Access Dataset

Login to Access

Please login to view download links and access full dataset details.

Topics

Molecular Property Prediction
Graph Neural Networks

Source

Organization: hugging_face

Created: Unknown

Power Your Data Analysis with Premium AI Models

Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.

Enjoy a free trial and save 20%+ compared to official pricing.