openai/gsm8k
GSM8K (Grade School Math 8K) is a dataset of 8.5 K high‑quality, linguistically diverse elementary mathematics word problems. It supports question‑answering tasks that require multi‑step reasoning, typically involving 2–8 steps of basic arithmetic (+, –, ×, ÷). Problems are of middle‑school difficulty and most can be solved without explicitly defining variables. Solutions are provided in natural language rather than pure mathematical notation. The dataset offers two configurations, "main" and "socratic," each with different answer formats.
Description
Dataset Overview
Basic Information
- Dataset Name: Grade School Math 8K (GSM8K)
- Language: English
- License: MIT
- Multilinguality: Monolingual
- Size Category: 1K < n < 10K
- Source Dataset: Original data
- Task Category: Text Generation
- Labels: Mathematics word problems
Configurations
Main Configuration (main)
- Features:
question: question stringanswer: answer string
- Splits:
train: 7,473 samples, 3,963,202 bytestest: 1,319 samples, 713,732 bytes
- Download Size: 2,725,633 bytes
- Total Size: 4,676,934 bytes
Socratic Configuration (socratic)
- Features:
question: question stringanswer: answer string
- Splits:
train: 7,473 samples, 5,198,108 bytestest: 1,319 samples, 936,859 bytes
- Download Size: 3,164,254 bytes
- Total Size: 6,134,967 bytes
Description
Summary
GSM8K is a high‑quality, linguistically diverse dataset of elementary mathematics word problems containing 8.5 K items. It is intended for question‑answering tasks that require multi‑step reasoning.
- Problems typically need 2–8 reasoning steps.
- Solutions primarily involve basic arithmetic (+, –, ×, ÷) to reach the final answer.
- An adept middle‑school student should be able to solve each problem; most do not require explicit variable definitions.
- Answers are expressed in natural language rather than pure mathematical notation, reflecting the belief that this format is most useful for exposing internal monologue in large language models.
Supported Tasks & Leaderboards
The dataset is commonly used to evaluate logical and mathematical capabilities of language models and appears in benchmarks such as the LLM Leaderboard.
Data Instances
Main Configuration (main)
{
"question": "Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May?",
"answer": "Natalia sold 48/2 = <<48/2=24>>24 clips in May.\nNatalia sold 48+24 = <<48+24=72>>72 clips altogether in April and May.\n#### 72,"
}
Socratic Configuration (socratic)
{
"question": "Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May?",
"answer": "How many clips did Natalia sell in May? ** Natalia sold 48/2 = <<48/2=24>>24 clips in May.\nHow many clips did Natalia sell altogether in April and May? ** Natalia sold 48+24 = <<48+24=72>>72 clips altogether in April and May.\n#### 72,"
}
Fields
question: elementary mathematics problem stringanswer: complete answer string containing multi‑step reasoning, calculator annotations, and final numeric solution
Splits
| Name | Train | Test |
|---|---|---|
| main | 7,473 | 1,319 |
| socratic | 7,473 | 1,319 |
AI studio
Generate PPTs instantly with Nano Banana Pro.
Generate PPT NowAccess Dataset
Please login to view download links and access full dataset details.
Topics
Source
Organization: hugging_face
Created: Unknown
Power Your Data Analysis with Premium AI Models
Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.
Enjoy a free trial and save 20%+ compared to official pricing.