Chinese-SimpleQA
Chinese SimpleQA is a comprehensive Chinese benchmark for evaluating factual correctness of language models on short questions. It features five characteristics: Chinese language, diversity, high quality, static references, and ease of evaluation. The dataset covers six major topics with 99 fine‑grained sub‑topics, spanning humanities to science and engineering, containing 3,000 high‑quality questions to help developers assess factual accuracy in Chinese and support algorithm research.
Dataset description and usage context
Chinese SimpleQA Dataset Overview
Basic Information
- License: cc‑by‑nc‑sa‑4.0
- Task Category: Question Answering
- Language: Chinese
- Dataset Name: Chinese SimpleQA
- Scale: 10 K < n < 100 K
Introduction
- Goal: Evaluate language models' factual accuracy on short questions.
- Key Features:
- Chinese: Dedicated to Chinese language, assessing LLM factuality in Chinese.
- Diversity: Covers six major topics—"Chinese Culture", "Humanities", "Engineering, Technology & Applied Science", "Life, Art & Culture", "Society", "Natural Science"—with 99 sub‑topics.
- High Quality: Strict quality control ensures accuracy.
- Static: Reference answers remain unchanged over time.
- Easy Evaluation: Short QA pairs enable rapid assessment via existing LLM APIs (e.g., OpenAI).
Content
- Topic Coverage: 6 major topics, 99 fine‑grained sub‑topics.
- Number of Questions: 3,000 high‑quality questions.
Citation
@misc{he2024chinesesimpleqachinesefactuality, title={Chinese SimpleQA: A Chinese Factuality Evaluation for Large Language Models}, author={Yancheng He and Shilong Li and Jiaheng Liu and Yingshui Tan and Weixun Wang and Hui Huang and Xingyuan Bu and Hangyu Guo and Chengwei Hu and Boren Zheng and Zhuoran Lin and Xuepeng Liu and Dekai Sun and Shirong Lin and Zhicheng Zheng and Xiaoyong Zhu and Wenbo Su and Bo Zheng}, year={2024}, eprint={2411.07140}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2411.07140}, }
Pair the dataset with AI analysis and content workflows.
Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.