Murple/ksponspeech

The KsponSpeech dataset contains 969 hours of Korean conversational speech recorded by approximately 2,000 native Korean speakers in clean environments. All data were created by recording dialogues between two people and manually transcribing the audio. Transcriptions provide both orthographic and phonetic versions, along with disfluency tags (e.g., filler words, repeated words, word fragments) to indicate spontaneous speech. The dataset is primarily used for automatic speech recognition tasks and has been publicly released on the Korean government open data platform.

Updated 11/14/2022

hugging_face

Description

Dataset Overview

Dataset Name

Name: KsponSpeech

Dataset Attributes

Language: Korean (ko)
Language Creation Method: Crowdsourced
Multilinguality: Monolingual
Annotation Creation Method: Expert-generated
Size: 10K<n<100K
Source Data: Original
Task Category: Automatic Speech Recognition

Dataset Description

Summary: Contains 969 hours of general open-domain conversational speech recorded by about 2,000 native Korean speakers in clean environments. The data were constructed by recording two people freely conversing and manually transcribing the recordings. The transcription provides dual orthographic and phonetic versions, as well as disfluency tags for spontaneous speech such as filler words, repeated words, and word fragments.
Supported Tasks: Automatic Speech Recognition
Language: Korean

Dataset Structure

Data Instances: Each instance includes audio information (path, array, sample rate), text transcription, and a unique ID.
Data Fields:
- Audio: Contains the audio file path, decoded audio array, and sample rate.
- Text: Transcription of the audio file.
- ID: Unique identifier for the data sample.
Data Splits: Includes training, validation, and two evaluation sets (eval.clean and eval.other).

Dataset Creation

Source Data: Constructed by recording two people freely conversing and manually transcribing the dialogues.
Annotations: Provide dual orthographic and phonetic transcriptions along with disfluency tags for spontaneous speech.

Citation Information

bibtex @Article{app10196936, AUTHOR = {Bang, Jeong-Uk and Yun, Seung and Kim, Seung-Hi and Choi, Mu-Yeol and Lee, Min-Kyu and Kim, Yeo-Jeong and Kim, Dong-Hyun and Park, Jun and Lee, Young-Jik and Kim, Sang-Hun}, TITLE = {KsponSpeech: Korean Spontaneous Speech Corpus for Automatic Speech Recognition}, JOURNAL = {Applied Sciences}, VOLUME = {10}, YEAR = {2020}, NUMBER = {19}, ARTICLE-NUMBER = {6936}, URL = {https://www.mdpi.com/2076-3417/10/19/6936}, ISSN = {2076-3417}, ABSTRACT = {This paper introduces a large-scale spontaneous speech corpus of Korean, named KsponSpeech. This corpus contains 969 h of general open-domain dialog utterances, spoken by about 2000 native Korean speakers in a clean environment. All data were constructed by recording the dialogue of two people freely conversing on a variety of topics and manually transcribing the utterances. The transcription provides a dual transcription consisting of orthography and pronunciation, and disfluency tags for spontaneity of speech, such as filler words, repeated words, and word fragments. This paper also presents the baseline performance of an end-to-end speech recognition model trained with KsponSpeech. In addition, we investigated the performance of standard end-to-end architectures and the number of sub-word units suitable for Korean. We investigated issues that should be considered in spontaneous speech recognition in Korean. KsponSpeech is publicly available on an open data hub site of the Korea government.}, DOI = {10.3390/app10196936} }

AI studio

Generate PPTs instantly with Nano Banana Pro.

Generate PPT Now

Access Dataset

Please login to view download links and access full dataset details.

Topics

Speech Recognition

Natural Language Processing

Source

Organization: hugging_face

Created: Unknown

Power Your Data Analysis with Premium AI Models

Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.

Enjoy a free trial and save 20%+ compared to official pricing.

Check Prices →