JUHE API Marketplace
DATASET
Open Source Community

allenai/math_qa

We introduce a large‑scale dataset of mathematical word problems. By annotating the AQuA‑RAT dataset with a novel representation language, we generate fully specified procedural programs. AQuA‑RAT provides the problem, options, rationale, and correct answer.

Updated 1/18/2024
hugging_face

Description

Dataset Overview

Dataset Summary

  • Name: MathQA
  • Language: English
  • Creator: Crowdsourced and expert generated
  • License: Apache‑2.0
  • Multilinguality: Monolingual
  • Size: 10 K < n < 100 K
  • Source Dataset: Extended from AQuA‑RAT
  • Task Type: Question Answering
  • Task ID: Multiple‑choice QA
  • Paper ID: mathqa

Data Structure

Data Instances

An example from the training set:

{
    "Problem": "a multiple choice test consists of 4 questions , and each question has 5 answer choices . in how many r ways can the test be completed if every question is unanswered ?",
    "Rationale": "\"5 choices for each of the 4 questions , thus total r of 5 * 5 * 5 * 5 = 5 ^ 4 = 625 ways to answer all of them . answer : c .\"",
    "annotated_formula": "power(5, 4)",
    "category": "general",
    "correct": "c",
    "linear_formula": "power(n1,n0)|",
    "options": "a ) 24 , b ) 120 , c ) 625 , d ) 720 , e ) 1024"
}

Data Fields

  • Problem: problem description (string)
  • Rationale: reasoning process (string)
  • options: answer options (string)
  • correct: correct answer label (string)
  • annotated_formula: annotated formula (string)
  • linear_formula: linear formula (string)
  • category: category label (string)

Data Splits

SplitTrainValidationTest
Size29,8374,4752,985

Dataset Creation

Dataset Information

  • Download Size: 7,302,821 bytes
  • Dataset Size: 22,965,979 bytes

Split Details

  • Test Set: 1,844,184 bytes, 2,985 samples
  • Train Set: 18,368,826 bytes, 29,837 samples
  • Validation Set: 2,752,969 bytes, 4,475 samples

License Information

The dataset follows the Apache License, Version 2.0.

Citation

@inproceedings{amini-etal-2019-mathqa,
    title = "{M}ath{QA}: Towards Interpretable Math Word Problem Solving with Operation-Based Formalisms",
    author = "Amini, Aida  and
      Gabriel, Saadia  and
      Lin, Shanchuan  and
      Koncel-Kedziorski, Rik  and
      Choi, Yejin  and
      Hajishirzi, Hannaneh",
    booktitle = "Proceedings of the 2019 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)",
    month = jun,
    year = "2019",
    address = "Minneapolis, Minnesota",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/N19-1245",
    doi = "10.18653/v1/N19-1245",
    pages = "2357--2367",
}

AI studio

Generate PPTs instantly with Nano Banana Pro.

Generate PPT Now

Access Dataset

Login to Access

Please login to view download links and access full dataset details.

Topics

Math Problem Solving
Natural Language Processing

Source

Organization: hugging_face

Created: Unknown

Power Your Data Analysis with Premium AI Models

Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.

Enjoy a free trial and save 20%+ compared to official pricing.