JUHE API Marketplace
DATASET
Open Source Community

Synthetic Lyrics Dataset

A synthetic lyrics dataset obtained via the Genius API and web crawling, annotated with theme, emotion, style, tone, and narrative using the Mistral API.

Updated 4/2/2024
github

Description

Dataset Overview

Dataset Name

  • Synthetic Lyrics Dataset with Mistral 7B

Data Collection Method

  • Artist ID acquisition
  • Song URL acquisition
  • Lyrics web crawling

Data Annotation

  • Using Mistral API combined with LangChain for annotation of theme, emotion, style, tone, and narrative
  • Annotation example: obtain text style via a specific template, requesting a concise answer with only three words and no explanation

Dataset Scale and Cost

  • Approximately 350 input tokens per lyric annotation
  • Total about 14,700,000 tokens
  • Total cost roughly $4

Dataset Applications

  • Fine‑tune language models to support
    • Song classification
    • Lyrics generation
    • Recommendation systems

AI studio

Generate PPTs instantly with Nano Banana Pro.

Generate PPT Now

Access Dataset

Login to Access

Please login to view download links and access full dataset details.

Topics

Lyric Analysis
Sentiment Analysis

Source

Organization: github

Created: 3/22/2024

Power Your Data Analysis with Premium AI Models

Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.

Enjoy a free trial and save 20%+ compared to official pricing.