Back to datasets
Dataset assetOpen Source CommunityNatural Language ProcessingMath Problem Solving
meta-math/MetaMathQA
MetaMathQA is a dataset enhanced from the training sets of GSM8K and MATH, without using any test‑set data. Each original question can be found in `meta-math/MetaMathQA`; these items originate from GSM8K or MATH training sets.
Source
hugging_face
Created
Nov 28, 2025
Updated
Dec 21, 2023
Signals
427 views
Availability
Linked source ready
Overview
Dataset description and usage context
Dataset Overview
Dataset Name
- Name: MetaMathQA
- Data Augmentation Source: Enhanced from the training sets of GSM8K and MATH.
- Test Set Usage: No test‑set data are included in the augmentation.
Model Training
- Model Name: MetaMath‑Mistral‑7B
- Base Model: Mistral‑7B
- Training Dataset: MetaMathQA
- Performance Improvement: Using MetaMathQA and upgrading the base model from llama‑2‑7B to Mistral‑7B raises GSM8K performance from 66.5 to 77.7.
Experimental Results
- Model Performance Comparison:
- MetaMath‑Mistral‑7B: GSM8K Pass@1 = 77.7, MATH Pass@1 = 28.2.
- Other Models: Includes MPT‑7B, Falcon‑7B, LLaMA‑1‑7B, etc. Detailed metrics are shown in the experiment table.
Citation
@article{yu2023metamath,
title={MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models},
author={Yu, Longhui and Jiang, Weisen and Shi, Han and Yu, Jincheng and Liu, Zhengying and Zhang, Yu and Kwok, James T and Li, Zhenguo and Weller, Adrian and Liu, Weiyang},
journal={arXiv preprint arXiv:2309.12284},
year={2023}
}
Need downstream help?
Pair the dataset with AI analysis and content workflows.
Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.