Back to datasets
Dataset assetOpen Source CommunityAudio ClassificationMusic Genre Classification

ccmusic-database/music_genre

The dataset comprises approximately 1,700 music excerpts in .mp3 format, each lasting 270–300 seconds and sampled at 22 kHz. The excerpts are taken from NetEase Cloud Music and are labelled with 16 genre categories. The dataset is divided into a Raw Subset and an Eval Subset, each providing different audio features and annotations. It was created to foster AI research in the music industry and was mainly collected and annotated by students. The dataset is intended for audio‑classification tasks and supports multilingual use.

Source
hugging_face
Created
Nov 28, 2025
Updated
Mar 21, 2025
Signals
383 views
Availability
Linked source ready
Overview

Dataset description and usage context

Dataset Overview

Basic Information

  • Name: Music Genre Dataset
  • License: MIT
  • Languages: Chinese, English
  • Tags: music, art
  • Size: 10K<n<100K

Description

  • Overview: Contains about 1,700 music pieces in .mp3 format, each 270–300 seconds long, sampled at 22 kHz. The pieces are divided into 16 distinct music styles.
  • Source: Data sourced from NetEase Music; genre tags are included with the downloads.
  • Classification: 16 music styles.

Structure

  • Audio Format: .mp3
  • Sample Rate: 22 kHz
  • Duration Range: 270–300 seconds
  • Label Taxonomy: Three‑level hierarchy (2 coarse classes, 9 middle classes, 16 fine‑grained classes).

Usage Example

  • Loading: Use the load_dataset function; both eval and default subsets are available.
  • Processing: The dataset includes train, validation, and test splits and supports audio‑classification tasks.

Creation

  • Collection & Annotation: Collected and annotated by CCMUSIC students; 1,700 pieces grouped into 17 styles.
  • Copyright: Only spectrograms are provided due to copyright restrictions.

Notes

  • Language Bias: Majority of tracks are English.
  • Sample Balance: Class distribution is imbalanced.

License

  • Type: MIT License
  • Copyright Holder: CCMUSIC
  • Conditions: Free use, copy, modify, merge, publish, distribute, sublicense, and sell copies, provided the copyright notice and license terms are retained.

Citation

  • Authors: Monan Zhou, Shenyang Xu, Zhaorui Liu, Zhaowen Wang, Feng Yu, Wei Li, Baoqiang Han
  • Title: CCMusic: an Open and Diverse Database for Chinese and General Music Information Retrieval Research
  • Year: 2024
  • Version: 1.2
  • URL: https://huggingface.co/ccmusic-database
Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.