Back to datasets
Dataset assetOpen Source CommunityNatural Language UnderstandingClassical Chinese Processing

CCLUE

CCLUE is a benchmark for evaluating natural language understanding of classical Chinese, containing multiple task datasets such as sentence segmentation and punctuation, named entity recognition, classical Chinese classification, classical poetry sentiment classification, and 文白 retrieval.

Source
github
Created
Mar 3, 2021
Updated
Aug 23, 2023
Signals
172 views
Availability
Linked source ready
Overview

Dataset description and usage context

Dataset Overview

Dataset Name: CCLUE

Description: CCLUE is a benchmark for evaluating natural language understanding of classical Chinese, containing multiple task datasets, benchmark models, and evaluation code. Researchers can quickly evaluate various pre‑trained language models with simple code.

Tasks and Dataset Details

Task NameAbbr.Train SetDev SetTest SetTask TypeMetric
Sentence Segmentation & PunctuationS&P26,9354,0753,992Sequence LabelingF1
Named Entity RecognitionNER2,566281327Sequence LabelingF1
Classical Chinese ClassificationCLS160,00020,00020,000Text ClassificationAcc
Classical Poetry Sentiment ClassificationSENT16,0002,0002,000Text ClassificationAcc
文白 RetrievalRETR--10,000Text RetrievalAcc

Evaluation Method

  • Quick Evaluation: No code download required; submit model to HuggingFace and request evaluation. Results returned within three business days.
  • Local Evaluation: Download data and code, install dependencies, prepare model, and run evaluation script. Results are saved in the outputs folder.

Submitting Results

Results can be submitted to the CCLUE leaderboard, requiring submission of organization, model name, project/paper URL, model weight link, and evaluation results. All submissions must be reproducible and will be verified before appearing on the leaderboard.

Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio