JUHE API Marketplace
DATASET
Open Source Community

l3lab/ntp-mathlib-instruct-context

The miniCTX dataset is a neural theorem proving dataset that emphasizes proof tasks within long-context environments. It consists of Lean 4 tactic prediction examples extracted from Mathlib, each containing a prompt (instruction, preceding file content, proof state) and a completion (tactic). File contents are truncated to 1,024 tokens. The dataset was generated using the `ntptoolkit` utilities `ntp‑training‑data` and `instruction_tuning.py`, and detailed generation configuration is provided.

Updated 9/6/2024
hugging_face

Description

Dataset Overview

Dataset Name

miniCTX: Neural Theorem Proving with (Long‑)Contexts

Dataset Content

  • Source: Lean 4 tactic prediction examples extracted from Mathlib.
  • Example Structure:
    • prompt:
      • Contains instruction, preceding file content, proof state or instruction, proof state
    • completion: tactic
  • File Content: Truncated to 1,024 tokens.

Dataset Configuration

  • Configuration Name: default
  • Data Files:
    • split: train, dev, test
    • path: "with_context_mathlib_only/with_context_train*", "with_context_mathlib_only/with_context_dev*", "with_context_mathlib_only/with_context_test*"

Version Information

  • Generation Tools: ntptoolkit's ntp‑training‑data and instruction_tuning.py
  • Configuration Details:
    {
        "repo": "https://github.com/leanprover-community/mathlib4",
        "commit": "cf8e23a62939ed7cc530fbb68e83539730f32f86",
        "lean": "leanprover/lean4:v4.4.0",
        "name": "mathlib",
        "import_file": "Mathlib.lean",
        "imports": ["Mathlib"]
    }
    

Citation

@misc{hu2024minictx, author = {Jiewen Hu and Thomas Zhu and Sean Welleck}, title = {miniCTX: Neural Theorem Proving with (Long‑)Contexts}, year = {2024}, eprint={}, archivePrefix={arXiv}, }

AI studio

Generate PPTs instantly with Nano Banana Pro.

Generate PPT Now

Access Dataset

Login to Access

Please login to view download links and access full dataset details.

Topics

Theorem Proving
Lean4

Source

Organization: hugging_face

Created: Unknown

Power Your Data Analysis with Premium AI Models

Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.

Enjoy a free trial and save 20%+ compared to official pricing.