Back to datasets
Dataset assetOpen Source CommunityMulti‑Task EvaluationChinese Language Understanding

haonan-li/cmmlu

CMMLU is a comprehensive Chinese evaluation suite specifically designed to assess large‑scale multi‑task language understanding capabilities in Chinese linguistic and cultural contexts. It covers 67 subjects ranging from basic to advanced professional levels, including STEM fields such as physics and mathematics as well as humanities and social sciences. Many tasks involve nuanced phrasing and cultural specifics that are hard to translate. Answers for many tasks are China‑specific and may not be applicable elsewhere. Each subject provides development and test sets; every question is a four‑option multiple‑choice item with a single correct answer.

Source
hugging_face
Created
Nov 28, 2025
Updated
Jul 13, 2023
Signals
292 views
Availability
Linked source ready
Overview

Dataset description and usage context

Dataset Overview

Dataset Name

  • CMMLU

Description

  • CMMLU is a comprehensive benchmark for evaluating large language models (LLMs) on advanced knowledge and reasoning abilities within Chinese language and cultural contexts. It spans 67 topics from elementary to advanced professional levels, covering STEM subjects such as physics and mathematics, as well as humanities and social sciences.

Features

  • Multiple‑choice and QA tasks.
  • Each question offers four options with only one correct answer.
  • Many tasks contain context‑specific nuances and phrasing that are difficult to translate.
  • Answers for many tasks are China‑specific and may not be suitable for other regions or languages.

Structure

  • Provides development and test sets for each topic.
  • Development set contains 5 questions per topic; test set contains over 100 questions per topic.

Usage

  • The dataset can be loaded via Python, supporting per‑topic loading or loading all data at once.

License

Citation

@misc{li2023cmmlu,
      title={CMMLU: Measuring massive multitask language understanding in Chinese}, 
      author={Haonan Li and Yixuan Zhang and Fajri Koto and Yifei Yang and Hai Zhao and Yeyun Gong and Nan Duan and Timothy Baldwin},
      year={2023},
      eprint={2306.09212},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}
Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio