JUHE API Marketplace
DATASET
Open Source Community

LeMat-Bulk

The LeMatBulk dataset is a materials science and chemistry dataset that includes several configurations (such as compatible_pbe, compatible_pbesol, compatible_scan, non_compatible) and encompasses various chemical structure features such as elements, chemical formulas, lattice vectors, and energy properties. The dataset is intended to support materials science research, particularly in the context of density functional theory (DFT) calculations. It contains subsets filtered for compatibility according to different DFT functionals and pseudopotentials. The dataset also describes methods for ensuring compatibility and deduplication of entries. Distributed under the CC‑BY‑4.0 license, it can be downloaded from the Hugging Face datasets library and used in Python.

Updated 12/10/2024
huggingface

Description

LeMat‑Bulk Dataset Overview

Dataset Description

Configuration Information

  • compatible_pbe:

    • Features:
      • elements: sequence[string]
      • nsites: int
      • chemical_formula_anonymous: string
      • chemical_formula_reduced: string
      • chemical_formula_descriptive: string
      • nelements: int
      • dimension_types: sequence[int]
      • nperiodic_dimensions: int
      • lattice_vectors: sequence[sequence[float]]
      • immutable_id: string
      • cartesian_site_positions: sequence[sequence[float]]
      • species: string
      • species_at_sites: sequence[string]
      • last_modified: string
      • elements_ratios: sequence[float]
      • stress_tensor: sequence[sequence[float]]
      • energy: float
      • magnetic_moments: sequence[float]
      • forces: sequence[sequence[float]]
      • total_magnetization: float
      • dos_ef: float
      • functional: string
      • cross_compatibility: bool
      • entalpic_fingerprint: string
    • Splits:
      • train: 5,335,299 samples, 8,043,765,194 bytes
    • Download Size: 3,036,919,717 bytes
    • Dataset Size: 8,043,765,194 bytes
  • compatible_pbesol:

    • Features: same as above
    • Splits:
      • train: 447,824 samples, 646,300,349 bytes
    • Download Size: 230,878,194 bytes
    • Dataset Size: 646,300,349 bytes
  • compatible_scan:

    • Features: same as above
    • Splits:
      • train: 422,840 samples, 597,846,818 bytes
    • Download Size: 207,887,396 bytes
    • Dataset Size: 597,846,818 bytes
  • non_compatible:

    • Features: same as above
    • Splits:
      • train: 519,627 samples, 818,845,899 bytes
    • Download Size: 268,949,608 bytes
    • Dataset Size: 818,845,899 bytes

Data Fields

Feature NameData TypeDescriptionOptimade Required Field
elementssequence[string]List of elements in the structure
nsitesintTotal number of sites in the structure
chemical_formula_anonymousstringAnonymous chemical formula
chemical_formula_reducedstringReduced chemical formula
chemical_formula_descriptivestringDescriptive chemical formula
nelementsintNumber of distinct elements in the structure
dimension_typessequence[int]Periodic boundary condition types
nperiodic_dimensionsintNumber of periodic dimensions
lattice_vectorssequence[sequence[float]]Lattice vectors
immutable_idstringMaterial ID
cartesian_site_positionssequence[sequence[float]]Cartesian site positions
speciesJSONSpecies information
species_at_sitessequence[string]Chemical element at each site
last_modifieddatetimeLast modification date
elements_ratiosdictElemental composition ratios
stress_tensorsequence[sequence[float]]Stress tensor
energyfloatUncorrected energy
magnetic_momentssequence[float]Magnetic moment per site
forcessequence[sequence[float]]Force per site
total_magnetizationfloatTotal magnetization of the structure
functionalstringComputational functional
cross_compatibilityboolCompatibility with other rows
entalpic_fingerprintstringMaterial fingerprint

Available Subsets

  • Compatible, PBE (default): Rows filtered for DFT compatibility, containing only PBE records.
  • Compatible, PBESol: Contains only PBESol data.
  • Compatible, SCAN: Contains only SCAN data.
  • All: All records.

Database Statistics

DatabaseNumber of MaterialsNumber of Structures
Materials Project148,453189,403
Alexandria4,635,0665,459,260
OQMD1,076,9261,076,926
LeMaterial (All)5,860,4466,725,590
LeMaterial (Compatible, PBE)5,335,2995,335,299
LeMaterial (Compatible, PBESOL)447,824447,824
LeMaterial (Compatible, SCAN)422,840422,840

Methods

Compatibility Compliance

  • Pseudopotentials: Ensure consistent pseudopotentials are used.
  • Hubbard U Parameters: Exclude records containing specific elements.
  • Spin Polarization: Exclude non‑spin‑polarized calculations.
  • Convergence Criteria: No records were excluded based on convergence settings.
  • Energy Above Convex Hull: High‑energy materials were not filtered.

Deduplication Method

  • Compute bonds using the EconNN algorithm.
  • Build a structure graph and hash it with the Weisfeiler‑Lehman algorithm.
  • Add symmetry and composition information.
  • Remove duplicate structures, keeping only the lowest‑energy entry.

Future Updates

  • Planned release of band gap information for all materials.
  • Unified energy corrections.
  • Publication of Bader charges.

AI studio

Generate PPTs instantly with Nano Banana Pro.

Generate PPT Now

Access Dataset

Login to Access

Please login to view download links and access full dataset details.

Topics

Materials Science
Density Functional Theory

Source

Organization: huggingface

Created: 12/7/2024

Power Your Data Analysis with Premium AI Models

Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.

Enjoy a free trial and save 20%+ compared to official pricing.