Chinese-Poetry-Dataset
The most comprehensive Chinese classical literature database, containing 55,000 Tang poems, 260,000 Song poems, and 21,000 Song lyrics. It covers roughly 14,000 poets from the Tang and Song dynasties and about 1,500 lyricists from the Song era. Sources are collected from the Internet.
Dataset description and usage context
Dataset Overview
Dataset Name
chinese-poetry: Most Comprehensive Classical Chinese Poetry Database
Dataset Content
- Tang Poetry: 55,000 poems
- Song Poetry: 260,000 poems
- Song Lyrics: 21,000 poems
- Tang & Song Poets: Approximately 14,000
- Song Lyricists: 1.5 K
- Other Collections: Include Five Dynasties Huajian Collection, Southern Tang Two Masters' Lyrics, Analects, Book of Songs, Dream of the Red Chamber, the Four Books and Five Classics, etc.
Data Formats
- Complete Tang Poetry: JSON
- Complete Song Poetry: JSON
- Complete Song Lyrics: CI format
- Other Collections: Various formats
Applications
The dataset can be used for education, research, cultural heritage preservation, and other beneficial purposes.
Analysis
- High‑Frequency Word Analysis: Provides statistics of frequent words in Tang poetry, Song poetry, and Song lyrics.
- Author Works Ranking: Shows ranking of authors by number of works.
- Ci‑Tune Statistics: Statistics of popular ci‑tunes during the Song period.
Contribution
Contributions are welcome via pull requests or issue discussions to improve and expand the database.
License
The dataset is released under the MIT License.
Pair the dataset with AI analysis and content workflows.
Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.