Back to datasets
Dataset assetOpen Source CommunityMathematical ReasoningChinese Math QA

meta-math/GSM8K_zh

GSM8K_zh is a Chinese dataset tailored for mathematical reasoning, consisting of problems and answers translated from the English GSM8K dataset. It includes 7,473 training samples and 1,319 test samples. Training samples contain full questions and answers; test samples provide only the translated questions. The dataset is suitable for Chinese–English question‑answering tasks, especially for mathematical problem solving.

Source
hugging_face
Created
Nov 28, 2025
Updated
Dec 4, 2023
Signals
938 views
Availability
Linked source ready
Overview

Dataset description and usage context

Dataset Overview

Basic Information

  • License: MIT
  • Task Category: Question Answering
  • Languages: English, Chinese
  • Tags: Mathematics, Math QA, Chinese Math QA
  • Scale: n<1K

Dataset Description

  • Name: GSM8K_zh
  • Description: GSM8K_zh is a dataset for Chinese mathematical reasoning. Question‑answer pairs are translated from the GSM8K dataset (https://github.com/openai/grade-school-math/tree/master) using few‑shot prompting with GPT‑3.5‑Turbo.
  • Sample Count: 7,473 training samples and 1,319 test samples. Training data are used for supervised fine‑tuning, while test data are for evaluation.
  • Sample Structure:
    • Training: includes question_zh (question) and answer_zh (answer) keys.
    • Test: provides only the translated question (question_zh).

Citation

  • Reference:
    @article{yu2023metamath,
      title={MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models},
      author={Yu, Longhui and Jiang, Weisen and Shi, Han and Yu, Jincheng and Liu, Zhengying and Zhang, Yu and Kwok, James T and Li, Zhenguo and Weller, Adrian and Liu, Weiyang},
      journal={arXiv preprint arXiv:2309.12284},
      year={2023}
    }
    
Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio