Datasets | JuheAPI

M-AILABS Speech Dataset

Speech Recognition

Speech Synthesis

The M‑AILABS Speech Dataset is the first large‑scale free dataset we provide for both speech recognition and speech synthesis training. The data are primarily derived from LibriVox and Project Gutenberg, containing nearly a thousand hours of audio and aligned text files. Each segment is transcribed, ranging from 1 to 20 seconds. Texts were published between 1884 and 1964 and are in the public domain. Audio recordings are also public domain from the LibriVox project, except for Ukrainian recordings, which are supplied by Nash Format or Gwara Media and are intended solely for machine‑learning use.

github

View Details

LibriSeVoc

Speech Synthesis

Dataset

We provide the LibriSeVoc dataset, which contains self‑vocoded samples generated by six state‑of‑the‑art neural vocoders. The goal is to highlight and exploit vocoder‑induced artifacts. The underlying real data are sourced from LibriTTS, following its naming convention.

github

View Details

VCTK Corpus

Speech Recognition

Speech Synthesis

This repository provides full‑context label files for the VCTK corpus. These label files were created following the preprocessing steps in r9y9/deepvoice3_pytorch. The dataset includes both full and mono label files, detailing the segmentation and annotation format of the audio data.

github

View Details

Dataset Hub

Browse by Category

M-AILABS Speech Dataset

LibriSeVoc

VCTK Corpus