DATASET
Open Source Community
Synthetic Lyrics Dataset
A synthetic lyrics dataset obtained via the Genius API and web crawling, annotated with theme, emotion, style, tone, and narrative using the Mistral API.
Updated 4/2/2024
github
Description
Dataset Overview
Dataset Name
- Synthetic Lyrics Dataset with Mistral 7B
Data Collection Method
- Artist ID acquisition
- Song URL acquisition
- Lyrics web crawling
Data Annotation
- Using Mistral API combined with LangChain for annotation of theme, emotion, style, tone, and narrative
- Annotation example: obtain text style via a specific template, requesting a concise answer with only three words and no explanation
Dataset Scale and Cost
- Approximately 350 input tokens per lyric annotation
- Total about 14,700,000 tokens
- Total cost roughly $4
Dataset Applications
- Fine‑tune language models to support
- Song classification
- Lyrics generation
- Recommendation systems
AI studio
Generate PPTs instantly with Nano Banana Pro.
Generate PPT NowAccess Dataset
Login to Access
Please login to view download links and access full dataset details.
Topics
Lyric Analysis
Sentiment Analysis
Source
Organization: github
Created: 3/22/2024
Power Your Data Analysis with Premium AI Models
Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.
Enjoy a free trial and save 20%+ compared to official pricing.