JUHE API Marketplace
DATASET
Open Source Community

CCLUE

CCLUE is a benchmark for evaluating natural language understanding of classical Chinese, containing multiple task datasets such as sentence segmentation and punctuation, named entity recognition, classical Chinese classification, classical poetry sentiment classification, and 文白 retrieval.

Updated 8/23/2023
github

Description

Dataset Overview

Dataset Name: CCLUE

Description: CCLUE is a benchmark for evaluating natural language understanding of classical Chinese, containing multiple task datasets, benchmark models, and evaluation code. Researchers can quickly evaluate various pre‑trained language models with simple code.

Tasks and Dataset Details

Task NameAbbr.Train SetDev SetTest SetTask TypeMetric
Sentence Segmentation & PunctuationS&P26,9354,0753,992Sequence LabelingF1
Named Entity RecognitionNER2,566281327Sequence LabelingF1
Classical Chinese ClassificationCLS160,00020,00020,000Text ClassificationAcc
Classical Poetry Sentiment ClassificationSENT16,0002,0002,000Text ClassificationAcc
文白 RetrievalRETR--10,000Text RetrievalAcc

Evaluation Method

  • Quick Evaluation: No code download required; submit model to HuggingFace and request evaluation. Results returned within three business days.
  • Local Evaluation: Download data and code, install dependencies, prepare model, and run evaluation script. Results are saved in the outputs folder.

Submitting Results

Results can be submitted to the CCLUE leaderboard, requiring submission of organization, model name, project/paper URL, model weight link, and evaluation results. All submissions must be reproducible and will be verified before appearing on the leaderboard.

AI studio

Generate PPTs instantly with Nano Banana Pro.

Generate PPT Now

Access Dataset

Login to Access

Please login to view download links and access full dataset details.

Topics

Classical Chinese Processing
Natural Language Understanding

Source

Organization: github

Created: 3/3/2021

Power Your Data Analysis with Premium AI Models

Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.

Enjoy a free trial and save 20%+ compared to official pricing.