Small-Chinese-Corpus
Contains multiple Chinese corpora, such as provincial‑city latitude/longitude coordinates, postal codes, administrative division codes, idioms, personal names, named‑entity recognition data, relation recognition data, reading comprehension, and image‑text QA data.
Dataset description and usage context
Overview of Small Chinese Corpus Datasets
Dataset List
-
China Provincial‑City Latitude/Longitude Coordinates
- Path:
city_location/
- Path:
-
China Provincial Postal Code Directory
- Path:
postal_provinces/
- Path:
-
National Administrative and Urban‑Rural Division Codes (2015)
- Path:
china_geo_code/
- Path:
-
Idioms Collection
- Path:
chengyu/
- Path:
-
Chinese Personal Names, plus characters from Jin Yong novels, Romance of the Three Kingdoms, and Dream of the Red Chamber
- Path:
chi_names/
- Path:
-
Chinese Named‑Entity Recognition Sample
- Path:
NER_chi/
- Path:
-
Chinese Relation Recognition Sample
- Path:
relation_multiple_chi/
- Path:
-
Chinese Reading Comprehension Sample
- Path:
reading_comprehension_chi/
- Path:
-
Chinese Image‑Text QA data (based on MSCOCO)
- Path:
Chinese_Visual_QA_pairs/
- Path:
Pair the dataset with AI analysis and content workflows.
Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.