Dataset assetOpen Source CommunityNatural Language UnderstandingClassical Chinese Processing

CCLUE

CCLUE is a benchmark for evaluating natural language understanding of classical Chinese, containing multiple task datasets such as sentence segmentation and punctuation, named entity recognition, classical Chinese classification, classical poetry sentiment classification, and 文白 retrieval.

Source

github

Created

Mar 3, 2021

Updated

Aug 23, 2023

Signals

172 views

Availability

Linked source ready

Overview

Dataset description and usage context

Dataset Overview

Dataset Name: CCLUE

Description: CCLUE is a benchmark for evaluating natural language understanding of classical Chinese, containing multiple task datasets, benchmark models, and evaluation code. Researchers can quickly evaluate various pre‑trained language models with simple code.

Tasks and Dataset Details

Task Name	Abbr.	Train Set	Dev Set	Test Set	Task Type	Metric
Sentence Segmentation & Punctuation	S&P	26,935	4,075	3,992	Sequence Labeling	F1
Named Entity Recognition	NER	2,566	281	327	Sequence Labeling	F1
Classical Chinese Classification	CLS	160,000	20,000	20,000	Text Classification	Acc
Classical Poetry Sentiment Classification	SENT	16,000	2,000	2,000	Text Classification	Acc
文白 Retrieval	RETR	-	-	10,000	Text Retrieval	Acc

Evaluation Method

Quick Evaluation: No code download required; submit model to HuggingFace and request evaluation. Results returned within three business days.
Local Evaluation: Download data and code, install dependencies, prepare model, and run evaluation script. Results are saved in the outputs folder.

Submitting Results

Results can be submitted to the CCLUE leaderboard, requiring submission of organization, model name, project/paper URL, model weight link, and evaluation results. All submissions must be reproducible and will be verified before appearing on the leaderboard.

Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio