Explore high-quality datasets for your AI and machine learning projects.
CrystalDFT is a small molecular crystal database created by the Bernal Institute at the University of Limerick, containing DFT‑predicted electromechanical properties for 572 organic crystals. The dataset was generated via high‑throughput screening to identify sustainable materials with excellent piezoelectric performance, aiming to replace lead‑based piezoelectrics. Applications focus on the development and optimization of piezoelectric materials, addressing environmental and health concerns associated with traditional lead‑based compounds.
The LeMatBulk dataset is a materials science and chemistry dataset that includes several configurations (such as compatible_pbe, compatible_pbesol, compatible_scan, non_compatible) and encompasses various chemical structure features such as elements, chemical formulas, lattice vectors, and energy properties. The dataset is intended to support materials science research, particularly in the context of density functional theory (DFT) calculations. It contains subsets filtered for compatibility according to different DFT functionals and pseudopotentials. The dataset also describes methods for ensuring compatibility and deduplication of entries. Distributed under the CC‑BY‑4.0 license, it can be downloaded from the Hugging Face datasets library and used in Python.
The dataset contains multiple material‑science features such as chemical formula, number of sites, volume, energy, band gap, etc., which can be used for material property research and prediction. It includes 256,963 samples (total size 725 MB, download size 156 MB).
This dataset contains per‑atom formation energy data for 133,420 materials. It is provided as two main files: `index.json`, which includes material indices, IDs, formulas, atom counts, and per‑atom formation energies; and `data.hdf5`, which stores structural information (lattice, number of atoms, per‑atom energy, atom pointers) and atomic data (positions, atomic numbers).