Back to datasets
Dataset assetOpen Source CommunityNatural Language ProcessingMathematical Word Problems

openai/gsm8k

GSM8K (Grade School Math 8K) is a dataset of 8.5 K high‑quality, linguistically diverse elementary mathematics word problems. It supports question‑answering tasks that require multi‑step reasoning, typically involving 2–8 steps of basic arithmetic (+, –, ×, ÷). Problems are of middle‑school difficulty and most can be solved without explicitly defining variables. Solutions are provided in natural language rather than pure mathematical notation. The dataset offers two configurations, "main" and "socratic," each with different answer formats.

Source
hugging_face
Created
Nov 28, 2025
Updated
Jan 4, 2024
Signals
1,678 views
Availability
Linked source ready
Overview

Dataset description and usage context

Dataset Overview

Basic Information

  • Dataset Name: Grade School Math 8K (GSM8K)
  • Language: English
  • License: MIT
  • Multilinguality: Monolingual
  • Size Category: 1K < n < 10K
  • Source Dataset: Original data
  • Task Category: Text Generation
  • Labels: Mathematics word problems

Configurations

Main Configuration (main)

  • Features:
    • question: question string
    • answer: answer string
  • Splits:
    • train: 7,473 samples, 3,963,202 bytes
    • test: 1,319 samples, 713,732 bytes
  • Download Size: 2,725,633 bytes
  • Total Size: 4,676,934 bytes

Socratic Configuration (socratic)

  • Features:
    • question: question string
    • answer: answer string
  • Splits:
    • train: 7,473 samples, 5,198,108 bytes
    • test: 1,319 samples, 936,859 bytes
  • Download Size: 3,164,254 bytes
  • Total Size: 6,134,967 bytes

Description

Summary

GSM8K is a high‑quality, linguistically diverse dataset of elementary mathematics word problems containing 8.5 K items. It is intended for question‑answering tasks that require multi‑step reasoning.

  • Problems typically need 2–8 reasoning steps.
  • Solutions primarily involve basic arithmetic (+, –, ×, ÷) to reach the final answer.
  • An adept middle‑school student should be able to solve each problem; most do not require explicit variable definitions.
  • Answers are expressed in natural language rather than pure mathematical notation, reflecting the belief that this format is most useful for exposing internal monologue in large language models.

Supported Tasks & Leaderboards

The dataset is commonly used to evaluate logical and mathematical capabilities of language models and appears in benchmarks such as the LLM Leaderboard.

Data Instances

Main Configuration (main)

{
    "question": "Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May?",
    "answer": "Natalia sold 48/2 = <<48/2=24>>24 clips in May.\nNatalia sold 48+24 = <<48+24=72>>72 clips altogether in April and May.\n#### 72,"
}

Socratic Configuration (socratic)

{
    "question": "Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May?",
    "answer": "How many clips did Natalia sell in May? ** Natalia sold 48/2 = <<48/2=24>>24 clips in May.\nHow many clips did Natalia sell altogether in April and May? ** Natalia sold 48+24 = <<48+24=72>>72 clips altogether in April and May.\n#### 72,"
}

Fields

  • question: elementary mathematics problem string
  • answer: complete answer string containing multi‑step reasoning, calculator annotations, and final numeric solution

Splits

NameTrainTest
main7,4731,319
socratic7,4731,319
Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio