Back to datasets
Dataset assetOpen Source CommunityNatural Language ProcessingChinese Corpus

Small-Chinese-Corpus

Contains multiple Chinese corpora, such as provincial‑city latitude/longitude coordinates, postal codes, administrative division codes, idioms, personal names, named‑entity recognition data, relation recognition data, reading comprehension, and image‑text QA data.

Source
github
Created
Apr 10, 2019
Updated
Apr 10, 2019
Signals
153 views
Availability
Linked source ready
Overview

Dataset description and usage context

Overview of Small Chinese Corpus Datasets

Dataset List

  1. China Provincial‑City Latitude/Longitude Coordinates

    • Path: city_location/
  2. China Provincial Postal Code Directory

    • Path: postal_provinces/
  3. National Administrative and Urban‑Rural Division Codes (2015)

    • Path: china_geo_code/
  4. Idioms Collection

    • Path: chengyu/
  5. Chinese Personal Names, plus characters from Jin Yong novels, Romance of the Three Kingdoms, and Dream of the Red Chamber

    • Path: chi_names/
  6. Chinese Named‑Entity Recognition Sample

    • Path: NER_chi/
  7. Chinese Relation Recognition Sample

    • Path: relation_multiple_chi/
  8. Chinese Reading Comprehension Sample

    • Path: reading_comprehension_chi/
  9. Chinese Image‑Text QA data (based on MSCOCO)

    • Path: Chinese_Visual_QA_pairs/
Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio