Dataset assetOpen Source CommunityNatural Language ProcessingMathematical Reasoning

MathCritique-76k

MathCritique‑76k is a dataset for training and testing large language models (LLMs) on mathematical reasoning tasks, containing model responses and step‑level feedback. The dataset was collected via an automated, scalable framework and aims to help models generate natural‑language feedback, improving performance on mathematical reasoning tasks.

Source

github

Created

Nov 26, 2024

Updated

Nov 26, 2024

Signals

140 views

Availability

Linked source ready

Overview

Dataset description and usage context

MathCritique Dataset Overview

Dataset Introduction

Name: MathCritique‑76k
Source: Automatically collected by the AutoMathCritique framework, containing responses to mathematical reasoning tasks and their step‑level feedback.
Purpose: Fine‑tune language models to generate natural‑language mathematical reasoning feedback.
Features:
- Utilizes a two‑player paradigm separating the reasoning and critique roles.
- The critique model provides step‑level feedback during both training and testing, supervising the reasoning model.
- The dataset helps improve the reasoning model's performance on challenging queries, especially when extending reasoning time.

Dataset Structure

Raw Data: Built upon GSM8k and MATH training sets; each query includes a problem and its answer.
New Data: Built from GPT‑4 feedback; each query includes a problem, feedback, and a refined answer.
Size: Currently 100 examples are released, with more to follow.

Usage

Install Dependencies:
- LLaMA‑Factory dependencies
- vllm for inference
- deepspeed for training
- Custom transformers version
Run Experiments:
- Use selfimprove/inference-all.sh script for training, inference, and evaluation.
- Key configuration parameters include dataset path, model name, sampling temperature, etc.

License

Type: Apache 2.0 License
Link: Apache2.0 License

Contact Information

Author: Zhiheng Xi
Email: zhxi22@m.fudan.edu.cn

Citation

@misc{xi2024enhancingllmreasoningcritique, title={Enhancing LLM Reasoning via Critique Models with Test‑Time and Training‑Time Supervision}, author={Zhiheng Xi and Dingwen Yang and Jixuan Huang and Jiafu Tang and Guanyu Li and Yiwen Ding and Wei He and Boyang Hong and Shihan Do and Wenyu Zhan and Xiao Wang and Rui Zheng and Tao Ji and Xiaowei Shi and Yitao Zhai and Rongxiang Weng and Jingang Wang and Xunliang Cai and Tao Gui and Zuxuan Wu and Qi Zhang and Xipeng Qiu and Xuanjing Huang and Yu‑Gang Jiang}, year={2024}, eprint={2411.16579}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2411.16579}, }

Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio