Dataset assetOpen Source CommunityMulti‑Task EvaluationChinese Language Understanding

haonan-li/cmmlu

CMMLU is a comprehensive Chinese evaluation suite specifically designed to assess large‑scale multi‑task language understanding capabilities in Chinese linguistic and cultural contexts. It covers 67 subjects ranging from basic to advanced professional levels, including STEM fields such as physics and mathematics as well as humanities and social sciences. Many tasks involve nuanced phrasing and cultural specifics that are hard to translate. Answers for many tasks are China‑specific and may not be applicable elsewhere. Each subject provides development and test sets; every question is a four‑option multiple‑choice item with a single correct answer.

Source

hugging_face

Created

Nov 28, 2025

Updated

Jul 13, 2023

Signals

292 views

Availability

Linked source ready

Overview

Dataset description and usage context

Dataset Overview

Dataset Name

CMMLU

Description

CMMLU is a comprehensive benchmark for evaluating large language models (LLMs) on advanced knowledge and reasoning abilities within Chinese language and cultural contexts. It spans 67 topics from elementary to advanced professional levels, covering STEM subjects such as physics and mathematics, as well as humanities and social sciences.

Features

Multiple‑choice and QA tasks.
Each question offers four options with only one correct answer.
Many tasks contain context‑specific nuances and phrasing that are difficult to translate.
Answers for many tasks are China‑specific and may not be suitable for other regions or languages.

Structure

Provides development and test sets for each topic.
Development set contains 5 questions per topic; test set contains over 100 questions per topic.

Usage

The dataset can be loaded via Python, supporting per‑topic loading or loading all data at once.

License

The dataset follows the Creative Commons Attribution‑NonCommercial‑ShareAlike 4.0 International License.

Citation

@misc{li2023cmmlu,
      title={CMMLU: Measuring massive multitask language understanding in Chinese}, 
      author={Haonan Li and Yixuan Zhang and Fajri Koto and Yifei Yang and Hai Zhao and Yeyun Gong and Nan Duan and Timothy Baldwin},
      year={2023},
      eprint={2306.09212},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio