Small-Chinese-Corpus
Contains multiple Chinese corpora, such as provincial‑city latitude/longitude coordinates, postal codes, administrative division codes, idioms, personal names, named‑entity recognition data, relation recognition data, reading comprehension, and image‑text QA data.
Description
Overview of Small Chinese Corpus Datasets
Dataset List
-
China Provincial‑City Latitude/Longitude Coordinates
- Path:
city_location/
- Path:
-
China Provincial Postal Code Directory
- Path:
postal_provinces/
- Path:
-
National Administrative and Urban‑Rural Division Codes (2015)
- Path:
china_geo_code/
- Path:
-
Idioms Collection
- Path:
chengyu/
- Path:
-
Chinese Personal Names, plus characters from Jin Yong novels, Romance of the Three Kingdoms, and Dream of the Red Chamber
- Path:
chi_names/
- Path:
-
Chinese Named‑Entity Recognition Sample
- Path:
NER_chi/
- Path:
-
Chinese Relation Recognition Sample
- Path:
relation_multiple_chi/
- Path:
-
Chinese Reading Comprehension Sample
- Path:
reading_comprehension_chi/
- Path:
-
Chinese Image‑Text QA data (based on MSCOCO)
- Path:
Chinese_Visual_QA_pairs/
- Path:
AI studio
Generate PPTs instantly with Nano Banana Pro.
Generate PPT NowAccess Dataset
Please login to view download links and access full dataset details.
Topics
Source
Organization: github
Created: 4/10/2019
Power Your Data Analysis with Premium AI Models
Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.
Enjoy a free trial and save 20%+ compared to official pricing.