Back to datasets
Dataset assetOpen Source CommunitySentiment AnalysisLyric Analysis

Synthetic Lyrics Dataset

A synthetic lyrics dataset obtained via the Genius API and web crawling, annotated with theme, emotion, style, tone, and narrative using the Mistral API.

Source
github
Created
Mar 22, 2024
Updated
Apr 2, 2024
Signals
197 views
Availability
Linked source ready
Overview

Dataset description and usage context

Dataset Overview

Dataset Name

  • Synthetic Lyrics Dataset with Mistral 7B

Data Collection Method

  • Artist ID acquisition
  • Song URL acquisition
  • Lyrics web crawling

Data Annotation

  • Using Mistral API combined with LangChain for annotation of theme, emotion, style, tone, and narrative
  • Annotation example: obtain text style via a specific template, requesting a concise answer with only three words and no explanation

Dataset Scale and Cost

  • Approximately 350 input tokens per lyric annotation
  • Total about 14,700,000 tokens
  • Total cost roughly $4

Dataset Applications

  • Fine‑tune language models to support
    • Song classification
    • Lyrics generation
    • Recommendation systems
Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio