agicorp/MathInstruct
MathInstruct is a carefully curated instruction‑tuning dataset that is lightweight yet versatile. It aggregates 13 math reasoning datasets, six of which are newly curated in this work. The dataset uniquely focuses on a mix of chain‑of‑thought (CoT) and program‑of‑thought (PoT) reasoning, ensuring broad coverage across mathematical domains. It is used for text generation tasks, primarily in English, with sizes ranging from 100 k to 1 M examples. It is associated with models based on Llama‑2 and Code Llama, ranging from 7 B to 70 B parameters. License information for each subset is provided.
Description
Dataset Overview
Name: MathInstruct
License: MIT
Task Category: Text Generation
Language: English
Size Category: 100 k–1 M examples
Tags: Mathematics
Dataset Details
- Source: MathInstruct aggregates 13 math reasoning datasets, six of which are newly compiled in this work.
- Features: Emphasizes a hybrid of chain‑of‑thought (CoT) and program‑of‑thought (PoT) reasoning, covering a wide range of mathematical fields.
- Models:
- Base Models: Llama‑2 and Code Llama
- Model Variants:
- 7B: MAmmoTH‑7B, MAmmoTH‑Coder‑7B
- 13B: MAmmoTH‑13B, MAmmoTH‑Coder‑13B
- 34B: MAmmoTH‑Coder‑34B
- 70B: MAmmoTH‑70B
License Details
- GSM8K: MIT
- GSM8K‑RFT: Not listed
- AQuA‑RAT: Apache 2.0
- MATH: MIT
- TheoremQA: MIT
- Camel‑Math: Attribution‑NonCommercial 4.0 International
- NumGLUE: Apache‑2.0
- MathQA: Apache‑2.0
- Our Curated: MIT
Citation
@article{yue2023mammoth,
title={MAmmoTH: Building Math Generalist Models through Hybrid Instruction Tuning},
author={Xiang Yue, Xingwei Qu, Ge Zhang, Yao Fu, Wenhao Huang, Huan Sun, Yu Su, Wenhu Chen},
journal={arXiv preprint arXiv:2309.05653},
year={2023}
}
AI studio
Generate PPTs instantly with Nano Banana Pro.
Generate PPT NowAccess Dataset
Please login to view download links and access full dataset details.
Topics
Source
Organization: hugging_face
Created: Unknown
Power Your Data Analysis with Premium AI Models
Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.
Enjoy a free trial and save 20%+ compared to official pricing.