JUHE API Marketplace
DATASET
Open Source Community

MathBench

MathBench is a comprehensive mathematics assessment dataset featuring a five‑level difficulty scheme and covering 3,709 problems ranging from basic arithmetic to university‑level topics. The dataset supports bilingual (Chinese and English) questions and uses a Circular Evaluation (CE) method to assess models more realistically.

Updated 5/21/2024
github

Description

Dataset Overview

Dataset Name

MathBench: A Hierarchical Mathematics Benchmark for Evaluating Theoretical and Applied Capabilities of Language Models.

Dataset Features

  1. Five‑Stage Difficulty Scheme: 3,709 questions spanning five educational stages from basic arithmetic to university level.
  2. Bilingual Evaluation: Each problem is provided in both Chinese and English (basic calculation tasks have bilingual versions).
  3. Circular Evaluation (CE): Multiple answer attempts with shuffled option orders to better reflect model ability.
  4. Theoretical Question Support: Each stage includes theory‑based questions to test conceptual understanding.

Dataset Updates

  • 2024‑5‑20: Accepted at ACL 2024; additional model performance results released.
  • 2024‑3‑14: Full version released with 3,709 bilingual questions.
  • 2024‑1‑26: Application‑oriented question subset released.

Dataset Structure

The structure diagram shows the distribution of questions across the five educational stages.

Model Performance

  • Zero‑shot CoT and few‑shot CoT methods evaluated on multiple‑choice and text‑based questions. Results are presented in tables with accuracy and CE metrics.

Application vs. Theory Performance

  • MathBench‑A: Shows model performance on applied questions.
  • MathBench‑T: Shows model performance on theoretical questions.

Bilingual Performance

The dataset supports bilingual evaluation; detailed performance numbers are omitted in the provided excerpt.

Model Size vs. Average Score

A plot illustrates the relationship between model parameters and MathBench scores, with GPT‑4‑0125‑Preview highlighted by a red dashed line.

Inference with OpenCompass

Detailed steps for running MathBench inference using the OpenCompass toolkit are provided, including installation, data preparation, and execution commands.

Citation & Technical Report

Reference information for citing the dataset is supplied.

AI studio

Generate PPTs instantly with Nano Banana Pro.

Generate PPT Now

Access Dataset

Login to Access

Please login to view download links and access full dataset details.

Topics

Mathematical Assessment
Multilingual Support

Source

Organization: github

Created: 1/15/2024

Power Your Data Analysis with Premium AI Models

Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.

Enjoy a free trial and save 20%+ compared to official pricing.