Back to datasets
Dataset assetOpen Source CommunityNatural Language ProcessingChinese Idioms

mmdjiji/bert-chinese-idioms

This dataset is used to train a BERT language model for Chinese idioms, with training data generated by the Node.JS script preprocess.js.

Source
hugging_face
Created
Nov 28, 2025
Updated
Jun 28, 2022
Signals
34 views
Availability
Linked source ready
Overview

Dataset description and usage context

Dataset Overview

License

  • License Type: GPL-3.0

Dataset Usage

  • Used to train language models

Preprocessing Tools

  • Preprocessing script: preprocess.js
  • Script type: Node.JS
Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio