l3lab/ntp-mathlib-instruct-context
The miniCTX dataset is a neural theorem proving dataset that emphasizes proof tasks within long-context environments. It consists of Lean 4 tactic prediction examples extracted from Mathlib, each containing a prompt (instruction, preceding file content, proof state) and a completion (tactic). File contents are truncated to 1,024 tokens. The dataset was generated using the `ntptoolkit` utilities `ntp‑training‑data` and `instruction_tuning.py`, and detailed generation configuration is provided.
Description
Dataset Overview
Dataset Name
miniCTX: Neural Theorem Proving with (Long‑)Contexts
Dataset Content
- Source: Lean 4 tactic prediction examples extracted from Mathlib.
- Example Structure:
- prompt:
- Contains instruction, preceding file content, proof state or instruction, proof state
- completion: tactic
- prompt:
- File Content: Truncated to 1,024 tokens.
Dataset Configuration
- Configuration Name: default
- Data Files:
- split: train, dev, test
- path: "with_context_mathlib_only/with_context_train*", "with_context_mathlib_only/with_context_dev*", "with_context_mathlib_only/with_context_test*"
Version Information
- Generation Tools:
ntptoolkit'sntp‑training‑dataandinstruction_tuning.py - Configuration Details:
{ "repo": "https://github.com/leanprover-community/mathlib4", "commit": "cf8e23a62939ed7cc530fbb68e83539730f32f86", "lean": "leanprover/lean4:v4.4.0", "name": "mathlib", "import_file": "Mathlib.lean", "imports": ["Mathlib"] }
Citation
@misc{hu2024minictx, author = {Jiewen Hu and Thomas Zhu and Sean Welleck}, title = {miniCTX: Neural Theorem Proving with (Long‑)Contexts}, year = {2024}, eprint={}, archivePrefix={arXiv}, }
AI studio
Generate PPTs instantly with Nano Banana Pro.
Generate PPT NowAccess Dataset
Please login to view download links and access full dataset details.
Topics
Source
Organization: hugging_face
Created: Unknown
Power Your Data Analysis with Premium AI Models
Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.
Enjoy a free trial and save 20%+ compared to official pricing.