Back to datasets
Dataset assetOpen Source CommunityChemistryMaterials Science

materials-toolkits/materials-project

This dataset contains per‑atom formation energy data for 133,420 materials. It is provided as two main files: `index.json`, which includes material indices, IDs, formulas, atom counts, and per‑atom formation energies; and `data.hdf5`, which stores structural information (lattice, number of atoms, per‑atom energy, atom pointers) and atomic data (positions, atomic numbers).

Source
hugging_face
Created
Nov 28, 2025
Updated
Feb 7, 2024
Signals
214 views
Availability
Linked source ready
Overview

Dataset description and usage context

Dataset Overview

Dataset Name

Materials Project (2019 dump)

Dataset Description

This dataset contains per‑atom formation energy data for 133,420 materials.

Data Source

Data processed from mp.2019.04.01.json.

Download Link

materials-project.tar.gz

MD5 Checksum

c132f3781f32cd17f3a92aa6501b9531

Data Content

The dataset is packaged in materials-project.tar.gz.

Index File (index.json)

Contains the following fields:

  • index (int): Index of the structure in the data file.
  • id (str): Materials Project ID.
  • formula (str): Chemical formula.
  • natoms (int): Number of atoms.
  • energy_pa (float): Formation energy per atom.

Data File (data.hdf5)

Contains the following fields:

  • structures: Group containing structural information.
    • structures/cell (float32): Lattice of the material.
    • structures/natoms (int32): Number of atoms.
    • structures/energy_pa (float32): Formation energy per atom.
    • structures/atoms_ptr (int64): Position of the first atom in the structure.
  • atoms: Group containing atomic information.
    • atoms/positions (float32): Atom positions.
    • atoms/atomic_number (uint8): Atomic numbers of the atoms.
Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio