JUHE API Marketplace
DATASET
Open Source Community

cdminix/libritts-aligned

The LibriTTS Corpus with Forced Alignments dataset is a speech dataset for automatic speech recognition (ASR) and text‑to‑speech (TTS) tasks. It includes audio files, corresponding transcripts, phonemes, and their durations. The dataset provides pre‑processed alignment information so users do not need to run the Montreal Forced Aligner locally. A data collator is also provided to create training batches. The dataset is divided into several subsets (train, dev, test, etc.) corresponding to different subsets of LibriSpeech.

Updated 4/26/2024
hugging_face

Description

Dataset Overview

Name: LibriTTS Corpus with Forced Alignments

Description: This dataset contains forced‑alignment information for speech data, suitable for ASR and TTS tasks.

Dataset Details

Language: English (en)

Tags:

  • speech
  • audio
  • automatic-speech-recognition
  • text-to-speech

License: CC-BY-4.0

Task Categories:

  • Automatic Speech Recognition
  • Text‑to‑Speech

Dataset Contents:

  • Each entry includes an audio file ID, speaker information, transcript, start and end times, phonemes with durations, and the audio file path.
  • Phonemes are represented using the International Phonetic Alphabet (IPA) and durations are given in frames.

Dataset Splits:

  • train: All training data except one sample per speaker reserved for validation.
  • dev: One sample per speaker for validation.
  • train.clean.100, train.clean.360, train.other.500: Training data extracted from different LibriSpeech subsets.
  • dev.clean, dev.other: Validation data extracted from different LibriSpeech subsets.
  • test.clean, test.other: Test data extracted from different LibriSpeech subsets.

Environment Variables:

  • LIBRITTS_VERBOSE: Controls verbosity of the dataset creation process.
  • LIBRITTS_MAX_WORKERS: Sets the maximum number of worker threads for alignment creation.
  • LIBRITTS_PATH: Sets the download path for LibriTTS data.

Usage Requirements

Software Dependencies:

  • pip install alignments phones (required)
  • pip install speech-collator (optional)

Data Collator:

  • A data collator is provided for creating training batches.
  • Install via pip install speech-collator; supports custom speaker2idx and phone2idx mappings.

Citation Information

References:

AI studio

Generate PPTs instantly with Nano Banana Pro.

Generate PPT Now

Access Dataset

Login to Access

Please login to view download links and access full dataset details.

Topics

Speech Recognition
Text-to-Speech

Source

Organization: hugging_face

Created: Unknown

Power Your Data Analysis with Premium AI Models

Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.

Enjoy a free trial and save 20%+ compared to official pricing.