Back to datasets
Dataset assetOpen Source CommunityCrystal StructureMaterials Science
nimashoghi/wbm
The dataset contains multiple material‑science features such as chemical formula, number of sites, volume, energy, band gap, etc., which can be used for material property research and prediction. It includes 256,963 samples (total size 725 MB, download size 156 MB).
Source
hugging_face
Created
Nov 28, 2025
Updated
Jul 21, 2024
Signals
219 views
Availability
Linked source ready
Overview
Dataset description and usage context
Dataset Overview
Dataset Features
- formula: chemical formula (string).
- n_sites: number of sites (float).
- volume: volume (float).
- uncorrected_energy: uncorrected energy (float).
- e_form_per_atom_wbm: formation energy per atom (WBM) (float).
- e_above_hull_wbm: energy above hull (WBM) (float).
- bandgap_pbe: PBE band gap (float).
- wyckoff_spglib_initial_structure: Wyckoff symbol of the initial structure (string).
- uncorrected_energy_from_cse: uncorrected energy from CSE (float).
- e_correction_per_atom_mp2020: MP2020 per‑atom correction energy (float).
- e_correction_per_atom_mp_legacy: MP legacy per‑atom correction energy (float).
- e_form_per_atom_uncorrected: uncorrected formation energy per atom (float).
- e_form_per_atom_mp2020_corrected: MP2020 corrected formation energy per atom (float).
- e_above_hull_mp2020_corrected_ppd_mp: MP2020 corrected energy above hull (float).
- site_stats_fingerprint_init_final_norm_diff: normalized fingerprint difference between initial and final structures (float).
- wyckoff_spglib: Wyckoff symbol (string).
- unique_prototype: boolean indicating a unique prototype.
- formula_from_cse: chemical formula from CSE (string).
- initial_structure: nested object describing the initial crystal structure (class, module, charge, lattice parameters, sites, etc.).
- id: identifier (string).
- material_id: material identifier (string).
- frac_pos, cart_pos, pos, cell: positional arrays (float).
- num_atoms: number of atoms (integer).
- atomic_numbers: atomic numbers (integer array).
- composition: composition (integer array).
Dataset Split
- all: 256,963 samples, total size 725 MB.
Size
- Download size: 156 MB.
- Dataset size: 725 MB.
Configuration
- default: Includes all data files under
data/all-*.
Need downstream help?
Pair the dataset with AI analysis and content workflows.
Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.