Dataset assetOpen Source CommunityNatural Language ProcessingMath Problem Solving

meta-math/MetaMathQA

MetaMathQA is a dataset enhanced from the training sets of GSM8K and MATH, without using any test‑set data. Each original question can be found in `meta-math/MetaMathQA`; these items originate from GSM8K or MATH training sets.

Source

hugging_face

Created

Nov 28, 2025

Updated

Dec 21, 2023

Signals

427 views

Availability

Linked source ready

Overview

Dataset description and usage context

Dataset Overview

Dataset Name

Name: MetaMathQA
Data Augmentation Source: Enhanced from the training sets of GSM8K and MATH.
Test Set Usage: No test‑set data are included in the augmentation.

Model Training

Model Name: MetaMath‑Mistral‑7B
Base Model: Mistral‑7B
Training Dataset: MetaMathQA
Performance Improvement: Using MetaMathQA and upgrading the base model from llama‑2‑7B to Mistral‑7B raises GSM8K performance from 66.5 to 77.7.

Experimental Results

Model Performance Comparison:
- MetaMath‑Mistral‑7B: GSM8K Pass@1 = 77.7, MATH Pass@1 = 28.2.
- Other Models: Includes MPT‑7B, Falcon‑7B, LLaMA‑1‑7B, etc. Detailed metrics are shown in the experiment table.

Citation

@article{yu2023metamath,
  title={MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models},
  author={Yu, Longhui and Jiang, Weisen and Shi, Han and Yu, Jincheng and Liu, Zhengying and Zhang, Yu and Kwok, James T and Li, Zhenguo and Weller, Adrian and Liu, Weiyang},
  journal={arXiv preprint arXiv:2309.12284},
  year={2023}
}

Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio