Dataset assetOpen Source CommunityNatural Language ProcessingMathematical Word Problems

openai/gsm8k

GSM8K (Grade School Math 8K) is a dataset of 8.5 K high‑quality, linguistically diverse elementary mathematics word problems. It supports question‑answering tasks that require multi‑step reasoning, typically involving 2–8 steps of basic arithmetic (+, –, ×, ÷). Problems are of middle‑school difficulty and most can be solved without explicitly defining variables. Solutions are provided in natural language rather than pure mathematical notation. The dataset offers two configurations, "main" and "socratic," each with different answer formats.

Source

hugging_face

Created

Nov 28, 2025

Updated

Jan 4, 2024

Signals

1,678 views

Availability

Linked source ready

Overview

Dataset description and usage context

Dataset Overview

Basic Information

Dataset Name: Grade School Math 8K (GSM8K)
Language: English
License: MIT
Multilinguality: Monolingual
Size Category: 1K < n < 10K
Source Dataset: Original data
Task Category: Text Generation
Labels: Mathematics word problems

Configurations

Main Configuration (main)

Features:
- question: question string
- answer: answer string
Splits:
- train: 7,473 samples, 3,963,202 bytes
- test: 1,319 samples, 713,732 bytes
Download Size: 2,725,633 bytes
Total Size: 4,676,934 bytes

Socratic Configuration (socratic)

Features:
- question: question string
- answer: answer string
Splits:
- train: 7,473 samples, 5,198,108 bytes
- test: 1,319 samples, 936,859 bytes
Download Size: 3,164,254 bytes
Total Size: 6,134,967 bytes

Description

Summary

GSM8K is a high‑quality, linguistically diverse dataset of elementary mathematics word problems containing 8.5 K items. It is intended for question‑answering tasks that require multi‑step reasoning.

Problems typically need 2–8 reasoning steps.
Solutions primarily involve basic arithmetic (+, –, ×, ÷) to reach the final answer.
An adept middle‑school student should be able to solve each problem; most do not require explicit variable definitions.
Answers are expressed in natural language rather than pure mathematical notation, reflecting the belief that this format is most useful for exposing internal monologue in large language models.

Supported Tasks & Leaderboards

The dataset is commonly used to evaluate logical and mathematical capabilities of language models and appears in benchmarks such as the LLM Leaderboard.

Data Instances

Main Configuration (main)

{
    "question": "Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May?",
    "answer": "Natalia sold 48/2 = <<48/2=24>>24 clips in May.\nNatalia sold 48+24 = <<48+24=72>>72 clips altogether in April and May.\n#### 72,"
}

Socratic Configuration (socratic)

{
    "question": "Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May?",
    "answer": "How many clips did Natalia sell in May? ** Natalia sold 48/2 = <<48/2=24>>24 clips in May.\nHow many clips did Natalia sell altogether in April and May? ** Natalia sold 48+24 = <<48+24=72>>72 clips altogether in April and May.\n#### 72,"
}

Fields

question: elementary mathematics problem string
answer: complete answer string containing multi‑step reasoning, calculator annotations, and final numeric solution

Splits

Name	Train	Test
main	7,473	1,319
socratic	7,473	1,319

Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio