Back to datasets
Dataset assetOpen Source CommunitySpeech RecognitionText-to-Speech

cdminix/libritts-aligned

The LibriTTS Corpus with Forced Alignments dataset is a speech dataset for automatic speech recognition (ASR) and text‑to‑speech (TTS) tasks. It includes audio files, corresponding transcripts, phonemes, and their durations. The dataset provides pre‑processed alignment information so users do not need to run the Montreal Forced Aligner locally. A data collator is also provided to create training batches. The dataset is divided into several subsets (train, dev, test, etc.) corresponding to different subsets of LibriSpeech.

Source
hugging_face
Created
Nov 28, 2025
Updated
Apr 26, 2024
Signals
142 views
Availability
Linked source ready
Overview

Dataset description and usage context

Dataset Overview

Name: LibriTTS Corpus with Forced Alignments

Description: This dataset contains forced‑alignment information for speech data, suitable for ASR and TTS tasks.

Dataset Details

Language: English (en)

Tags:

  • speech
  • audio
  • automatic-speech-recognition
  • text-to-speech

License: CC-BY-4.0

Task Categories:

  • Automatic Speech Recognition
  • Text‑to‑Speech

Dataset Contents:

  • Each entry includes an audio file ID, speaker information, transcript, start and end times, phonemes with durations, and the audio file path.
  • Phonemes are represented using the International Phonetic Alphabet (IPA) and durations are given in frames.

Dataset Splits:

  • train: All training data except one sample per speaker reserved for validation.
  • dev: One sample per speaker for validation.
  • train.clean.100, train.clean.360, train.other.500: Training data extracted from different LibriSpeech subsets.
  • dev.clean, dev.other: Validation data extracted from different LibriSpeech subsets.
  • test.clean, test.other: Test data extracted from different LibriSpeech subsets.

Environment Variables:

  • LIBRITTS_VERBOSE: Controls verbosity of the dataset creation process.
  • LIBRITTS_MAX_WORKERS: Sets the maximum number of worker threads for alignment creation.
  • LIBRITTS_PATH: Sets the download path for LibriTTS data.

Usage Requirements

Software Dependencies:

  • pip install alignments phones (required)
  • pip install speech-collator (optional)

Data Collator:

  • A data collator is provided for creating training batches.
  • Install via pip install speech-collator; supports custom speaker2idx and phone2idx mappings.

Citation Information

References:

Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio