JUHE API Marketplace
DATASET
Open Source Community

Chinese-Poetry-Dataset

The most comprehensive Chinese classical literature database, containing 55,000 Tang poems, 260,000 Song poems, and 21,000 Song lyrics. It covers roughly 14,000 poets from the Tang and Song dynasties and about 1,500 lyricists from the Song era. Sources are collected from the Internet.

Updated 4/16/2024
github

Description

Dataset Overview

Dataset Name

chinese-poetry: Most Comprehensive Classical Chinese Poetry Database

Dataset Content

  • Tang Poetry: 55,000 poems
  • Song Poetry: 260,000 poems
  • Song Lyrics: 21,000 poems
  • Tang & Song Poets: Approximately 14,000
  • Song Lyricists: 1.5 K
  • Other Collections: Include Five Dynasties Huajian Collection, Southern Tang Two Masters' Lyrics, Analects, Book of Songs, Dream of the Red Chamber, the Four Books and Five Classics, etc.

Data Formats

  • Complete Tang Poetry: JSON
  • Complete Song Poetry: JSON
  • Complete Song Lyrics: CI format
  • Other Collections: Various formats

Applications

The dataset can be used for education, research, cultural heritage preservation, and other beneficial purposes.

Analysis

  • High‑Frequency Word Analysis: Provides statistics of frequent words in Tang poetry, Song poetry, and Song lyrics.
  • Author Works Ranking: Shows ranking of authors by number of works.
  • Ci‑Tune Statistics: Statistics of popular ci‑tunes during the Song period.

Contribution

Contributions are welcome via pull requests or issue discussions to improve and expand the database.

License

The dataset is released under the MIT License.

AI studio

Generate PPTs instantly with Nano Banana Pro.

Generate PPT Now

Access Dataset

Login to Access

Please login to view download links and access full dataset details.

Topics

Classical Literature
Cultural Heritage

Source

Organization: github

Created: 12/18/2017

Power Your Data Analysis with Premium AI Models

Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.

Enjoy a free trial and save 20%+ compared to official pricing.