Explore high-quality datasets for your AI and machine learning projects.
The M‑AILABS Speech Dataset is the first large‑scale free dataset we provide for both speech recognition and speech synthesis training. The data are primarily derived from LibriVox and Project Gutenberg, containing nearly a thousand hours of audio and aligned text files. Each segment is transcribed, ranging from 1 to 20 seconds. Texts were published between 1884 and 1964 and are in the public domain. Audio recordings are also public domain from the LibriVox project, except for Ukrainian recordings, which are supplied by Nash Format or Gwara Media and are intended solely for machine‑learning use.
We provide the LibriSeVoc dataset, which contains self‑vocoded samples generated by six state‑of‑the‑art neural vocoders. The goal is to highlight and exploit vocoder‑induced artifacts. The underlying real data are sourced from LibriTTS, following its naming convention.
This repository provides full‑context label files for the VCTK corpus. These label files were created following the preprocessing steps in r9y9/deepvoice3_pytorch. The dataset includes both full and mono label files, detailing the segmentation and annotation format of the audio data.