High Quality Data

Dataset Hub

Explore high-quality datasets for your AI and machine learning projects.

Sort:

Browse by Category

AudioDataset

A repository containing various audio datasets, including speech, music, and audio mixture datasets. Speech datasets such as VCTK and LibriSpeech, music dataset such as StarNet, and audio mixture datasets such as Libri2Mix and Divide and Remaster (DnR).

github

View Details

CN-Celeb

Speaker Identification

Audio Data

The CN‑Celeb dataset is used for speaker recognition; the raw data are in FLAC format and can be directly used for training.

github

View Details