Xiezhi
Xiezhi is a comprehensive assessment suite designed to evaluate broad domain knowledge. It comprises 516 different disciplines, offering multiple‑choice questions across 13 topics, totaling 249,587 items, plus two sub‑sets (Xiezhi‑Specialty and Xiezhi‑Interdiscipline) each containing 15,000 questions.
Description
Dataset Overview
Xiezhi (獬豸) is a comprehensive evaluation suite for assessing language models (LMs) across holistic domain knowledge. It contains 249,587 multiple‑choice questions covering 516 disciplines and four difficulty levels.
Details
Question Design
- Each evaluated LM must select the best answer from 50 options.
- For every question, besides the correct answer, three distractors are provided, and the remaining 46 options are randomly drawn from all options across the Xiezhi dataset.
Evaluation Metric
- Mean Reciprocal Rank (MRR) is used, computing the reciprocal rank of the correct answer.
Sample Data
- Examples from the Xiezhi‑Specialty and Xiezhi‑Interdiscipline subsets are provided.
- Few‑shot learning configurations are demonstrated.
Usage
- Tests can be run on model collections including C‑Eval, M3KE, MMLU, Xiezhi‑Inter, and Xiezhi‑Spec, bundled in
./Tester/model_test.py. - Anyone can execute
./Tester/test.shto evaluate. - For custom data, override the
_get_datafunction in./Tester/model_test.py.
License
- This work is released under the MIT License.
- The Xiezhi dataset itself is under the Creative Commons Attribution‑NonCommercial‑ShareAlike 4.0 International License.
Citation
Please cite the following paper when using the dataset:
@article{gu2023xiezhi,
title={Xiezhi: An Ever‑Updating Benchmark for Holistic Domain Knowledge Evaluation},
author={Gu, Zhouhong and Zhu, Xiaoxuan and Ye, Haoning and Zhang, Lin and Wang, Jianchen and Jiang, Sihang and Xiong, Zhuozhi and Li, Zihan and He, Qianyu and Xu, Rui and Huang, Wenhao and Zheng, Weiguo and Feng, Hongwei and Xiao, Yanghua},
journal={arXiv:2304.11679},
year={2023}
}
AI studio
Generate PPTs instantly with Nano Banana Pro.
Generate PPT NowAccess Dataset
Please login to view download links and access full dataset details.
Topics
Source
Organization: arXiv
Created: 6/9/2023
Power Your Data Analysis with Premium AI Models
Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.
Enjoy a free trial and save 20%+ compared to official pricing.