MathCritique-76k
MathCritique‑76k is a dataset for training and testing large language models (LLMs) on mathematical reasoning tasks, containing model responses and step‑level feedback. The dataset was collected via an automated, scalable framework and aims to help models generate natural‑language feedback, improving performance on mathematical reasoning tasks.
Description
MathCritique Dataset Overview
Dataset Introduction
- Name: MathCritique‑76k
- Source: Automatically collected by the AutoMathCritique framework, containing responses to mathematical reasoning tasks and their step‑level feedback.
- Purpose: Fine‑tune language models to generate natural‑language mathematical reasoning feedback.
- Features:
- Utilizes a two‑player paradigm separating the reasoning and critique roles.
- The critique model provides step‑level feedback during both training and testing, supervising the reasoning model.
- The dataset helps improve the reasoning model's performance on challenging queries, especially when extending reasoning time.
Dataset Structure
- Raw Data: Built upon GSM8k and MATH training sets; each query includes a problem and its answer.
- New Data: Built from GPT‑4 feedback; each query includes a problem, feedback, and a refined answer.
- Size: Currently 100 examples are released, with more to follow.
Usage
- Install Dependencies:
- LLaMA‑Factory dependencies
- vllm for inference
- deepspeed for training
- Custom transformers version
- Run Experiments:
- Use
selfimprove/inference-all.shscript for training, inference, and evaluation. - Key configuration parameters include dataset path, model name, sampling temperature, etc.
- Use
License
- Type: Apache 2.0 License
- Link: Apache2.0 License
Contact Information
- Author: Zhiheng Xi
- Email: zhxi22@m.fudan.edu.cn
Citation
@misc{xi2024enhancingllmreasoningcritique, title={Enhancing LLM Reasoning via Critique Models with Test‑Time and Training‑Time Supervision}, author={Zhiheng Xi and Dingwen Yang and Jixuan Huang and Jiafu Tang and Guanyu Li and Yiwen Ding and Wei He and Boyang Hong and Shihan Do and Wenyu Zhan and Xiao Wang and Rui Zheng and Tao Ji and Xiaowei Shi and Yitao Zhai and Rongxiang Weng and Jingang Wang and Xunliang Cai and Tao Gui and Zuxuan Wu and Qi Zhang and Xipeng Qiu and Xuanjing Huang and Yu‑Gang Jiang}, year={2024}, eprint={2411.16579}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2411.16579}, }
AI studio
Generate PPTs instantly with Nano Banana Pro.
Generate PPT NowAccess Dataset
Please login to view download links and access full dataset details.
Topics
Source
Organization: github
Created: 11/26/2024
Power Your Data Analysis with Premium AI Models
Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.
Enjoy a free trial and save 20%+ compared to official pricing.