Back to datasets
Dataset assetOpen Source CommunitySentiment AnalysisLyric Analysis
Synthetic Lyrics Dataset
A synthetic lyrics dataset obtained via the Genius API and web crawling, annotated with theme, emotion, style, tone, and narrative using the Mistral API.
Source
github
Created
Mar 22, 2024
Updated
Apr 2, 2024
Signals
197 views
Availability
Linked source ready
Overview
Dataset description and usage context
Dataset Overview
Dataset Name
- Synthetic Lyrics Dataset with Mistral 7B
Data Collection Method
- Artist ID acquisition
- Song URL acquisition
- Lyrics web crawling
Data Annotation
- Using Mistral API combined with LangChain for annotation of theme, emotion, style, tone, and narrative
- Annotation example: obtain text style via a specific template, requesting a concise answer with only three words and no explanation
Dataset Scale and Cost
- Approximately 350 input tokens per lyric annotation
- Total about 14,700,000 tokens
- Total cost roughly $4
Dataset Applications
- Fine‑tune language models to support
- Song classification
- Lyrics generation
- Recommendation systems
Need downstream help?
Pair the dataset with AI analysis and content workflows.
Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.