DATASET
Open Source Community
allenai/math_qa
We introduce a large‑scale dataset of mathematical word problems. By annotating the AQuA‑RAT dataset with a novel representation language, we generate fully specified procedural programs. AQuA‑RAT provides the problem, options, rationale, and correct answer.
Updated 1/18/2024
hugging_face
Description
Dataset Overview
Dataset Summary
- Name: MathQA
- Language: English
- Creator: Crowdsourced and expert generated
- License: Apache‑2.0
- Multilinguality: Monolingual
- Size: 10 K < n < 100 K
- Source Dataset: Extended from AQuA‑RAT
- Task Type: Question Answering
- Task ID: Multiple‑choice QA
- Paper ID: mathqa
Data Structure
Data Instances
An example from the training set:
{
"Problem": "a multiple choice test consists of 4 questions , and each question has 5 answer choices . in how many r ways can the test be completed if every question is unanswered ?",
"Rationale": "\"5 choices for each of the 4 questions , thus total r of 5 * 5 * 5 * 5 = 5 ^ 4 = 625 ways to answer all of them . answer : c .\"",
"annotated_formula": "power(5, 4)",
"category": "general",
"correct": "c",
"linear_formula": "power(n1,n0)|",
"options": "a ) 24 , b ) 120 , c ) 625 , d ) 720 , e ) 1024"
}
Data Fields
Problem: problem description (string)Rationale: reasoning process (string)options: answer options (string)correct: correct answer label (string)annotated_formula: annotated formula (string)linear_formula: linear formula (string)category: category label (string)
Data Splits
| Split | Train | Validation | Test |
|---|---|---|---|
| Size | 29,837 | 4,475 | 2,985 |
Dataset Creation
Dataset Information
- Download Size: 7,302,821 bytes
- Dataset Size: 22,965,979 bytes
Split Details
- Test Set: 1,844,184 bytes, 2,985 samples
- Train Set: 18,368,826 bytes, 29,837 samples
- Validation Set: 2,752,969 bytes, 4,475 samples
License Information
The dataset follows the Apache License, Version 2.0.
Citation
@inproceedings{amini-etal-2019-mathqa,
title = "{M}ath{QA}: Towards Interpretable Math Word Problem Solving with Operation-Based Formalisms",
author = "Amini, Aida and
Gabriel, Saadia and
Lin, Shanchuan and
Koncel-Kedziorski, Rik and
Choi, Yejin and
Hajishirzi, Hannaneh",
booktitle = "Proceedings of the 2019 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)",
month = jun,
year = "2019",
address = "Minneapolis, Minnesota",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/N19-1245",
doi = "10.18653/v1/N19-1245",
pages = "2357--2367",
}
AI studio
Generate PPTs instantly with Nano Banana Pro.
Generate PPT NowAccess Dataset
Login to Access
Please login to view download links and access full dataset details.
Topics
Math Problem Solving
Natural Language Processing
Source
Organization: hugging_face
Created: Unknown
Power Your Data Analysis with Premium AI Models
Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.
Enjoy a free trial and save 20%+ compared to official pricing.