cdminix/libritts-aligned

The LibriTTS Corpus with Forced Alignments dataset is a speech dataset for automatic speech recognition (ASR) and text‑to‑speech (TTS) tasks. It includes audio files, corresponding transcripts, phonemes, and their durations. The dataset provides pre‑processed alignment information so users do not need to run the Montreal Forced Aligner locally. A data collator is also provided to create training batches. The dataset is divided into several subsets (train, dev, test, etc.) corresponding to different subsets of LibriSpeech.

Updated 4/26/2024

hugging_face

Description

Dataset Overview

Name: LibriTTS Corpus with Forced Alignments

Description: This dataset contains forced‑alignment information for speech data, suitable for ASR and TTS tasks.

Dataset Details

Language: English (en)

Tags:

speech
audio
automatic-speech-recognition
text-to-speech

License: CC-BY-4.0

Task Categories:

Automatic Speech Recognition
Text‑to‑Speech

Dataset Contents:

Each entry includes an audio file ID, speaker information, transcript, start and end times, phonemes with durations, and the audio file path.
Phonemes are represented using the International Phonetic Alphabet (IPA) and durations are given in frames.

Dataset Splits:

train: All training data except one sample per speaker reserved for validation.
dev: One sample per speaker for validation.
train.clean.100, train.clean.360, train.other.500: Training data extracted from different LibriSpeech subsets.
dev.clean, dev.other: Validation data extracted from different LibriSpeech subsets.
test.clean, test.other: Test data extracted from different LibriSpeech subsets.

Environment Variables:

LIBRITTS_VERBOSE: Controls verbosity of the dataset creation process.
LIBRITTS_MAX_WORKERS: Sets the maximum number of worker threads for alignment creation.
LIBRITTS_PATH: Sets the download path for LibriTTS data.

Usage Requirements

Software Dependencies:

pip install alignments phones (required)
pip install speech-collator (optional)

Data Collator:

A data collator is provided for creating training batches.
Install via pip install speech-collator; supports custom speaker2idx and phone2idx mappings.

Citation Information

References:

AI studio

Generate PPTs instantly with Nano Banana Pro.

Generate PPT Now

Access Dataset

Please login to view download links and access full dataset details.

Topics

Speech Recognition

Text-to-Speech

Source

Organization: hugging_face

Created: Unknown

Power Your Data Analysis with Premium AI Models

Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.

Enjoy a free trial and save 20%+ compared to official pricing.

Check Prices →