JUHE API Marketplace
DATASET
Open Source Community

Small-Chinese-Corpus

Contains multiple Chinese corpora, such as provincial‑city latitude/longitude coordinates, postal codes, administrative division codes, idioms, personal names, named‑entity recognition data, relation recognition data, reading comprehension, and image‑text QA data.

Updated 4/10/2019
github

Description

Overview of Small Chinese Corpus Datasets

Dataset List

  1. China Provincial‑City Latitude/Longitude Coordinates

    • Path: city_location/
  2. China Provincial Postal Code Directory

    • Path: postal_provinces/
  3. National Administrative and Urban‑Rural Division Codes (2015)

    • Path: china_geo_code/
  4. Idioms Collection

    • Path: chengyu/
  5. Chinese Personal Names, plus characters from Jin Yong novels, Romance of the Three Kingdoms, and Dream of the Red Chamber

    • Path: chi_names/
  6. Chinese Named‑Entity Recognition Sample

    • Path: NER_chi/
  7. Chinese Relation Recognition Sample

    • Path: relation_multiple_chi/
  8. Chinese Reading Comprehension Sample

    • Path: reading_comprehension_chi/
  9. Chinese Image‑Text QA data (based on MSCOCO)

    • Path: Chinese_Visual_QA_pairs/

AI studio

Generate PPTs instantly with Nano Banana Pro.

Generate PPT Now

Access Dataset

Login to Access

Please login to view download links and access full dataset details.

Topics

Chinese Corpus
Natural Language Processing

Source

Organization: github

Created: 4/10/2019

Power Your Data Analysis with Premium AI Models

Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.

Enjoy a free trial and save 20%+ compared to official pricing.