JUHE API Marketplace
DATASET
Open Source Community

mathvision

Recent advancements in Large Multimodal Models (LMMs) have shown promising results in mathematical reasoning within visual contexts, with models approaching human-level performance on existing benchmarks such as MathVista. However, we observe significant limitations in the diversity of questions and breadth of subjects covered by these benchmarks. To address this issue, we present the MATH-Vision (MATH-V) dataset, a meticulously curated collection of 3,040 high-quality mathematical problems with visual contexts sourced from real math competitions. Spanning 16 distinct mathematical disciplines and graded across 5 levels of difficulty, our dataset provides a comprehensive and diverse set of challenges for evaluating the mathematical reasoning abilities of LMMs.

Updated 2/24/2024
github

Description

Dataset Overview

Dataset Name

  • MATH‑Vision (MATH‑V) Dataset

Description

  • MATH‑Vision (MATH‑V) Dataset is a collection of 3,040 high‑quality mathematics problems with visual context, sourced from real competition problems. The dataset spans 16 mathematical domains and is stratified into five difficulty levels, providing a comprehensive benchmark for evaluating large multimodal models (LMMs) on mathematical reasoning.

Features

  • Multimodal Mathematical Reasoning: Designed to assess models’ ability to reason mathematically with visual inputs.
  • Broad Topic Coverage: Includes 16 domains such as analytic geometry, topology, and graph theory.
  • Multiple Difficulty Levels: Problems are categorized into five levels from easy to hard.

Usage

  • Model Evaluation: Used to evaluate models like GPT‑4, GPT‑4V, Gemini, etc., on mathematical reasoning tasks.
  • Research Tool: Provides evaluation code and data to support further research in multimodal mathematical reasoning.

Access

Related Work

  • Paper: Details of dataset construction and evaluation can be found on arXiv.

Example

  • Sample Content: Includes specific problems from fields such as analytic geometry, topology, and graph theory. Detailed examples are provided in Appendix D.3 of the paper.

Evaluation & Results

  • Model Performance: As of the latest update, GPT‑4o scores 30.39 % on MATH‑V, while human performance is around 70 %.
  • Evaluation Tools: Scripts are provided to compute accuracy and performance across disciplines and difficulty levels.

Citation

  • BibTeX:
    @misc{wang2024measuring,
          title={Measuring Multimodal Mathematical Reasoning with MATH‑Vision Dataset},
          author={Ke Wang and Junting Pan and Weikang Shi and Zimu Lu and Mingjie Zhan and Hongsheng Li},
          year={2024},
          eprint={2402.14804},
          archivePrefix={arXiv},
          primaryClass={cs.CV}
    }
    

AI studio

Generate PPTs instantly with Nano Banana Pro.

Generate PPT Now

Access Dataset

Login to Access

Please login to view download links and access full dataset details.

Topics

Mathematical Reasoning
Multimodal Learning

Source

Organization: github

Created: 2/17/2024

Power Your Data Analysis with Premium AI Models

Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.

Enjoy a free trial and save 20%+ compared to official pricing.