JUHE API Marketplace
DATASET
Open Source Community

agicorp/MathInstruct

MathInstruct is a carefully curated instruction‑tuning dataset that is lightweight yet versatile. It aggregates 13 math reasoning datasets, six of which are newly curated in this work. The dataset uniquely focuses on a mix of chain‑of‑thought (CoT) and program‑of‑thought (PoT) reasoning, ensuring broad coverage across mathematical domains. It is used for text generation tasks, primarily in English, with sizes ranging from 100 k to 1 M examples. It is associated with models based on Llama‑2 and Code Llama, ranging from 7 B to 70 B parameters. License information for each subset is provided.

Updated 3/23/2024
hugging_face

Description

Dataset Overview

Name: MathInstruct

License: MIT

Task Category: Text Generation

Language: English

Size Category: 100 k–1 M examples

Tags: Mathematics

Dataset Details

  • Source: MathInstruct aggregates 13 math reasoning datasets, six of which are newly compiled in this work.
  • Features: Emphasizes a hybrid of chain‑of‑thought (CoT) and program‑of‑thought (PoT) reasoning, covering a wide range of mathematical fields.
  • Models:
    • Base Models: Llama‑2 and Code Llama
    • Model Variants:
      • 7B: MAmmoTH‑7B, MAmmoTH‑Coder‑7B
      • 13B: MAmmoTH‑13B, MAmmoTH‑Coder‑13B
      • 34B: MAmmoTH‑Coder‑34B
      • 70B: MAmmoTH‑70B

License Details

  • GSM8K: MIT
  • GSM8K‑RFT: Not listed
  • AQuA‑RAT: Apache 2.0
  • MATH: MIT
  • TheoremQA: MIT
  • Camel‑Math: Attribution‑NonCommercial 4.0 International
  • NumGLUE: Apache‑2.0
  • MathQA: Apache‑2.0
  • Our Curated: MIT

Citation

@article{yue2023mammoth,
  title={MAmmoTH: Building Math Generalist Models through Hybrid Instruction Tuning},
  author={Xiang Yue, Xingwei Qu, Ge Zhang, Yao Fu, Wenhao Huang, Huan Sun, Yu Su, Wenhu Chen},
  journal={arXiv preprint arXiv:2309.05653},
  year={2023}
}

AI studio

Generate PPTs instantly with Nano Banana Pro.

Generate PPT Now

Access Dataset

Login to Access

Please login to view download links and access full dataset details.

Topics

Mathematics
Model Training

Source

Organization: hugging_face

Created: Unknown

Power Your Data Analysis with Premium AI Models

Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.

Enjoy a free trial and save 20%+ compared to official pricing.