Chinese-Poetry-Dataset
The most comprehensive Chinese classical literature database, containing 55,000 Tang poems, 260,000 Song poems, and 21,000 Song lyrics. It covers roughly 14,000 poets from the Tang and Song dynasties and about 1,500 lyricists from the Song era. Sources are collected from the Internet.
Description
Dataset Overview
Dataset Name
chinese-poetry: Most Comprehensive Classical Chinese Poetry Database
Dataset Content
- Tang Poetry: 55,000 poems
- Song Poetry: 260,000 poems
- Song Lyrics: 21,000 poems
- Tang & Song Poets: Approximately 14,000
- Song Lyricists: 1.5 K
- Other Collections: Include Five Dynasties Huajian Collection, Southern Tang Two Masters' Lyrics, Analects, Book of Songs, Dream of the Red Chamber, the Four Books and Five Classics, etc.
Data Formats
- Complete Tang Poetry: JSON
- Complete Song Poetry: JSON
- Complete Song Lyrics: CI format
- Other Collections: Various formats
Applications
The dataset can be used for education, research, cultural heritage preservation, and other beneficial purposes.
Analysis
- High‑Frequency Word Analysis: Provides statistics of frequent words in Tang poetry, Song poetry, and Song lyrics.
- Author Works Ranking: Shows ranking of authors by number of works.
- Ci‑Tune Statistics: Statistics of popular ci‑tunes during the Song period.
Contribution
Contributions are welcome via pull requests or issue discussions to improve and expand the database.
License
The dataset is released under the MIT License.
AI studio
Generate PPTs instantly with Nano Banana Pro.
Generate PPT NowAccess Dataset
Please login to view download links and access full dataset details.
Topics
Source
Organization: github
Created: 12/18/2017
Power Your Data Analysis with Premium AI Models
Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.
Enjoy a free trial and save 20%+ compared to official pricing.