Explore high-quality datasets for your AI and machine learning projects.
The miniCTX dataset is a neural theorem proving dataset that emphasizes proof tasks within long-context environments. It consists of Lean 4 tactic prediction examples extracted from Mathlib, each containing a prompt (instruction, preceding file content, proof state) and a completion (tactic). File contents are truncated to 1,024 tokens. The dataset was generated using the `ntptoolkit` utilities `ntp‑training‑data` and `instruction_tuning.py`, and detailed generation configuration is provided.