Back to datasets
Dataset assetOpen Source CommunityMultimodal DataMathematics Education
cmm-math
CMM‑Math is a Chinese multimodal mathematics dataset containing over 28,000 high‑quality samples covering 12 grades from primary school to high school. It includes diverse question types such as multiple‑choice and fill‑in‑the‑blank, with detailed solutions. Some questions involve visual context, making the dataset more challenging. The dataset is split into a training set (22,000+ samples) and an evaluation set (5,000+ samples).
Source
huggingface
Created
Sep 6, 2024
Updated
Sep 8, 2024
Signals
761 views
Availability
Linked source ready
Overview
Dataset description and usage context
CMM‑Math Dataset Overview
Basic Information
- License: BSD‑3‑Clause
- Language: Chinese
- Task Type: Text Generation
Dataset Introduction
- Name: CMM‑Math
- Content: A Chinese multimodal mathematics dataset containing benchmark and training portions, intended for evaluating and enhancing the mathematical reasoning abilities of large multimodal models (LMMs).
- Number of Samples: Over 28,000 high‑quality samples
- Question Types: Includes various formats such as multiple‑choice, fill‑in‑the‑blank, etc.
- Applicable Grades: Covers 12 grades from primary school to high school.
- Characteristics: Some questions or solutions may contain visual context, increasing the dataset’s difficulty.
Dataset Structure
... (structure details omitted as per original) ...
Related Papers
... (details omitted) ...
Need downstream help?
Pair the dataset with AI analysis and content workflows.
Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.