japanese-anime-speech-v2
japanese‑anime‑speech‑v2 is an audio‑text dataset designed to train automatic speech recognition models. It contains 300,506 audio clips and their transcriptions sourced from visual novels. The goal is to improve ASR models (e.g., OpenAI's Whisper) for transcribing anime and similar Japanese media dialogue. Audio is in MP3 format, sampled at 16 kHz, with an average length of 5.5 seconds. This is the first release of the japanese‑anime‑speech‑v2 series; compared with the previous version, audio quality has been adjusted and NSFW content is not filtered. The dataset is predominantly female voices, with vocabularies around love, relationships, and fantasy, which may not fully reflect real‑world speech patterns. Future plans include separating safe and NSFW content, improving text formatting, and expanding data sources.
Description
Japanese Anime Speech Dataset V2
Overview
japanese-anime-speech-v2 is an audio‑text dataset for training automatic speech recognition models. The dataset contains 292,637 audio‑text pairs sourced from various visual novels.
Dataset Information
- Number of audio‑text pairs: 292,637
- Safe content audio duration: 397.54 hours (86.8%)
- Non‑safe content audio duration: 52.36 hours (13.2%)
- Average safe content audio length: 5.3 seconds
- Data source: Visual novels
- Audio format: mp3 (128 kbps)
- Latest version: V2 – 29 June 2024
Dataset Characteristics
- Audio features:
- Sample rate: 16,000 Hz
- Text features:
- Data type: string
Dataset Splits
- Safe content (sfw):
- Bytes: 19174765803.112
- Samples: 271,788
- Non‑safe content (nsfw):
- Bytes: 2864808426.209
- Samples: 20,849
Dataset Size
- Download size: 24,379,492,733 bytes
- Dataset size: 22,039,574,229.321 bytes
Configuration
- Default configuration:
- Safe content file path: data/sfw-*
- Non‑safe content file path: data/nsfw-*
Version Changes
- Changes from V1 to V2:
- Substantial increase in size, from 73,004 to 292,637 audio‑text pairs
- Audio format changed from mp3 (192 kbps) to mp3 (128 kbps) for storage efficiency
- Separate splits for safe and non‑safe content
- Normalized repeated characters
- Removed audio lines without dialogue
- Removed low‑quality audio lines
Biases and Limitations
- The dataset is primarily sourced from visual novels, leading to a gender bias toward female voices and vocabulary focused on love, relationships, and fantasy, which may not reflect real‑world speech patterns.
- High audio quality may differ from everyday speaking conditions.
- Contains non‑safe content, making it unsuitable for all applications.
- Transcriptions are unformatted and uncleaned, which may affect text quality.
Future Plans
- Continue expanding the dataset with more sources.
Use and Citation
- The dataset is open for commercial and non‑commercial use.
- Citation is not mandatory, but providing a hyperlink to the dataset is encouraged when used in derived works.
AI studio
Generate PPTs instantly with Nano Banana Pro.
Generate PPT NowAccess Dataset
Please login to view download links and access full dataset details.
Topics
Source
Organization: huggingface
Created: 6/26/2024
Power Your Data Analysis with Premium AI Models
Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.
Enjoy a free trial and save 20%+ compared to official pricing.