Explore high-quality datasets for your AI and machine learning projects.
Recent advancements in Large Multimodal Models (LMMs) have shown promising results in mathematical reasoning within visual contexts, with models approaching human-level performance on existing benchmarks such as MathVista. However, we observe significant limitations in the diversity of questions and breadth of subjects covered by these benchmarks. To address this issue, we present the MATH-Vision (MATH-V) dataset, a meticulously curated collection of 3,040 high-quality mathematical problems with visual contexts sourced from real math competitions. Spanning 16 distinct mathematical disciplines and graded across 5 levels of difficulty, our dataset provides a comprehensive and diverse set of challenges for evaluating the mathematical reasoning abilities of LMMs.
MathCritique‑76k is a dataset for training and testing large language models (LLMs) on mathematical reasoning tasks, containing model responses and step‑level feedback. The dataset was collected via an automated, scalable framework and aims to help models generate natural‑language feedback, improving performance on mathematical reasoning tasks.
GSM8K_zh is a Chinese dataset tailored for mathematical reasoning, consisting of problems and answers translated from the English GSM8K dataset. It includes 7,473 training samples and 1,319 test samples. Training samples contain full questions and answers; test samples provide only the translated questions. The dataset is suitable for Chinese–English question‑answering tasks, especially for mathematical problem solving.