Dataset assetOpen Source CommunityText GenerationMathematical Word Problems

swulling/gsm8k_chinese

The dataset contains Chinese and English math word problems with answers, suitable for text generation tasks, especially the generation of math application problems. It is split into a training set (7,473 samples) and a test set (1,319 samples). Features include question, answer, Chinese question, and answer‑only fields.

Source

hugging_face

Created

Nov 28, 2025

Updated

Nov 28, 2023

Signals

540 views

Availability

Linked source ready

Overview

Dataset description and usage context

Dataset Overview

Basic Information

Language: Chinese
License: MIT
Size: 1K < n < 10K samples
Source Dataset: gsm8k
Task Category: text2text‑generation

Structure

Features

question: string
answer: string
question_zh‑cn: string
answer_only: integer

Splits

test:
- Bytes: 1,020,788
- Samples: 1,319
train:
- Bytes: 5,664,657
- Samples: 7,473

Download & Size

Download Size: 3,988,161 bytes
Dataset Size: 6,685,445 bytes

Configuration

Config Name: default
Data Files:
- test: data/test-*
- train: data/train-*

Labels

math-word-problems

Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio