Back to datasets
Dataset assetOpen Source CommunityLinguistic ResearchHistorical Data Analysis

ChineseDiachronicCorpus

Chinese diachronic corpus spanning more than sixty years, including Tencent news (2009‑2016), People's Daily (1946‑2003), and Reference News (1957‑2002). Based on this longitudinal corpus, it can support studies of language change over time, language monitoring, and research on sociocultural transformation.

Source
github
Created
Jan 10, 2021
Updated
Jan 10, 2021
Signals
286 views
Availability
Linked source ready
Overview

Dataset description and usage context

ChineseDiachronicCorpus Dataset Overview

Dataset Content

  • Tencent News: Time span 2009‑2016, approx. 5 GB.
  • People's Daily: Time span 1946‑2003, approx. 3.44 GB.
  • Reference News: Time span 1957‑2002, approx. 1.1 GB.

Dataset Applications

The dataset can be used for the following six research areas:

ApplicationTechnical ApproachUse Case
Lexical StudyTokenisation, frequency analysisVocabulary compilation
Semantic ComputationCo‑occurrence, MI collocation, dependency collocationSemantic lexicon creation
Trend AnalysisCirculation calculation, term extractionBuzzword monitoring
Cultural ComputationColour analysis, gender analysisCultural change study
Media ComparisonMedia difference calculationCommunication research
Grammar ResearchGrammar pattern retrievalGrammar textbook and dictionary creation

Data Access

The dataset is hosted on Baidu Cloud. Access details are as follows:

Data NameTime SpanSizeAccess Link & Password
Tencent News2009‑20165 GBLink: https://pan.baidu.com/s/16VMV1JioSrKGUQ0T7YfIGw Password: 57ux
People's Daily1946‑20033.44 GBLink: https://pan.baidu.com/s/1vUwt7hpoQLx-vgzsZjaBlw Password: jyvo
Reference News1957‑20021.1 GBLink: https://pan.baidu.com/s/1Ux_WCpkLqtfE60jXfGD3ow Password: 6ekf
Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio