Back to datasets
Dataset assetOpen Source CommunityNatural Language ProcessingMathematical Reasoning

MathCritique-76k

MathCritique‑76k is a dataset for training and testing large language models (LLMs) on mathematical reasoning tasks, containing model responses and step‑level feedback. The dataset was collected via an automated, scalable framework and aims to help models generate natural‑language feedback, improving performance on mathematical reasoning tasks.

Source
github
Created
Nov 26, 2024
Updated
Nov 26, 2024
Signals
140 views
Availability
Linked source ready
Overview

Dataset description and usage context

MathCritique Dataset Overview

Dataset Introduction

  • Name: MathCritique‑76k
  • Source: Automatically collected by the AutoMathCritique framework, containing responses to mathematical reasoning tasks and their step‑level feedback.
  • Purpose: Fine‑tune language models to generate natural‑language mathematical reasoning feedback.
  • Features:
    • Utilizes a two‑player paradigm separating the reasoning and critique roles.
    • The critique model provides step‑level feedback during both training and testing, supervising the reasoning model.
    • The dataset helps improve the reasoning model's performance on challenging queries, especially when extending reasoning time.

Dataset Structure

  • Raw Data: Built upon GSM8k and MATH training sets; each query includes a problem and its answer.
  • New Data: Built from GPT‑4 feedback; each query includes a problem, feedback, and a refined answer.
  • Size: Currently 100 examples are released, with more to follow.

Usage

  • Install Dependencies:
    • LLaMA‑Factory dependencies
    • vllm for inference
    • deepspeed for training
    • Custom transformers version
  • Run Experiments:
    • Use selfimprove/inference-all.sh script for training, inference, and evaluation.
    • Key configuration parameters include dataset path, model name, sampling temperature, etc.

License

Contact Information

Citation

@misc{xi2024enhancingllmreasoningcritique, title={Enhancing LLM Reasoning via Critique Models with Test‑Time and Training‑Time Supervision}, author={Zhiheng Xi and Dingwen Yang and Jixuan Huang and Jiafu Tang and Guanyu Li and Yiwen Ding and Wei He and Boyang Hong and Shihan Do and Wenyu Zhan and Xiao Wang and Rui Zheng and Tao Ji and Xiaowei Shi and Yitao Zhai and Rongxiang Weng and Jingang Wang and Xunliang Cai and Tao Gui and Zuxuan Wu and Qi Zhang and Xipeng Qiu and Xuanjing Huang and Yu‑Gang Jiang}, year={2024}, eprint={2411.16579}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2411.16579}, }

Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio