Back to datasets
Dataset assetOpen Source CommunityText MiningKorean Pop Music

Kpop-lyric-datasets

A JSON‑format dataset comprising 25,696 Korean pop songs, sourced from Melon's monthly charts (2000 – October 2023). The dataset includes Python functions for data processing and emphasizes copyright attribution and usage restrictions.

Source
github
Created
Dec 2, 2023
Updated
Dec 3, 2023
Signals
599 views
Availability
Linked source ready
Overview

Dataset description and usage context

Dataset Overview

Dataset Name

  • Kpop‑lyric‑datasets

Dataset Content

  • Contains 25,696 K‑pop songs in JSON format, sourced from Melon’s Monthly Chart Ranking 100 (2000 ~ 2023 Oct.).

License

  • Available for research purposes; commercial use requires negotiation with lyric authors, artists, composers, etc.

Dataset Structure

Data File Paths

  • melonmonthly-chartmelon-<year>melon-<year>-<month>melon-monthly_<year>-<month>_<chart rank>.json

Data Fields

  • info: Metadata such as year, month, rank, genre, and source website.
  • song_id: Song ID in the Melon database.
  • song_name: Song title.
  • album: Album name.
  • release_date: Release date.
  • artist: Artist name.
  • genre: Genre.
  • lyric_writer: Lyricist.
  • composer: Composer.
  • arranger: Arranger.
  • lyrics: Lyrics content, including line count and text.

Usage

Data Retrieval

  • Get 2023 data: data_parser.get_dict(2023) returns a dictionary.
  • Get 2010‑2022 data: data_parser.get_df(2010, 2022) returns a Pandas DataFrame.

Cloning

  • Clone with git clone https://github.com/EX3exp/Kpop-lyric-datasets.git.
Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio