Back to datasets
Dataset assetOpen Source CommunityLinguistic ResearchHistorical Data Analysis
ChineseDiachronicCorpus
Chinese diachronic corpus spanning more than sixty years, including Tencent news (2009‑2016), People's Daily (1946‑2003), and Reference News (1957‑2002). Based on this longitudinal corpus, it can support studies of language change over time, language monitoring, and research on sociocultural transformation.
Source
github
Created
Jan 10, 2021
Updated
Jan 10, 2021
Signals
286 views
Availability
Linked source ready
Overview
Dataset description and usage context
ChineseDiachronicCorpus Dataset Overview
Dataset Content
- Tencent News: Time span 2009‑2016, approx. 5 GB.
- People's Daily: Time span 1946‑2003, approx. 3.44 GB.
- Reference News: Time span 1957‑2002, approx. 1.1 GB.
Dataset Applications
The dataset can be used for the following six research areas:
| Application | Technical Approach | Use Case |
|---|---|---|
| Lexical Study | Tokenisation, frequency analysis | Vocabulary compilation |
| Semantic Computation | Co‑occurrence, MI collocation, dependency collocation | Semantic lexicon creation |
| Trend Analysis | Circulation calculation, term extraction | Buzzword monitoring |
| Cultural Computation | Colour analysis, gender analysis | Cultural change study |
| Media Comparison | Media difference calculation | Communication research |
| Grammar Research | Grammar pattern retrieval | Grammar textbook and dictionary creation |
Data Access
The dataset is hosted on Baidu Cloud. Access details are as follows:
| Data Name | Time Span | Size | Access Link & Password |
|---|---|---|---|
| Tencent News | 2009‑2016 | 5 GB | Link: https://pan.baidu.com/s/16VMV1JioSrKGUQ0T7YfIGw Password: 57ux |
| People's Daily | 1946‑2003 | 3.44 GB | Link: https://pan.baidu.com/s/1vUwt7hpoQLx-vgzsZjaBlw Password: jyvo |
| Reference News | 1957‑2002 | 1.1 GB | Link: https://pan.baidu.com/s/1Ux_WCpkLqtfE60jXfGD3ow Password: 6ekf |
Need downstream help?
Pair the dataset with AI analysis and content workflows.
Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.