AILAB-VNUHCM/vivos
VIVOS is a free Vietnamese audio corpus containing 15 hours of recordings, prepared for Vietnamese automatic speech recognition tasks. The corpus was compiled by the AILAB lab at VNU‑HCM – University of Science, aiming to attract researchers to address Vietnamese speech recognition challenges. It includes audio files, corresponding transcripts, speaker IDs, and file paths, split into training and test sets. The dataset is released under a CC BY‑NC‑SA 4.0 license for non‑commercial use.
Dataset description and usage context
Dataset Card VIVOS
Dataset Description
Overview
VIVOS is a free Vietnamese speech corpus comprising 15 hours of recordings for Vietnamese automatic speech recognition. The corpus was prepared by AILAB, the computer‑science lab at VNU‑HCM – University of Science, led by Prof. Vu Hai Quan.
Supported Tasks and Leaderboards
[Further information needed]
Language
Vietnamese
Dataset Structure
Data Instances
A typical entry includes the audio file path (field path) and its transcription (field sentence). Additional metadata about the speaker and paragraph containing the transcript are also provided.
{
"speaker_id": "VIVOSSPK01",
"path": "/home/admin/.cache/huggingface/datasets/downloads/extracted/.../VIVOSSPK01_R001.wav",
"audio": {
"path": "/home/admin/.cache/huggingface/datasets/downloads/extracted/.../VIVOSSPK01_R001.wav",
"array": [
-0.00048828, -0.00018311, -0.00137329, ..., 0.00079346, 0.00091553, 0.00085449
],
"sampling_rate": 16000
},
"sentence": "KHÁCH SẠN"
}
Data Fields
speaker_id: speaker identifierpath: audio file pathaudio: dictionary containing audio file path, decoded audio array, and sampling ratesentence: text that the speaker was prompted to read
Data Splits
Speech material is divided into training and test sets.
| Training | Test | |
|---|---|---|
| Number of speakers | 46 | 19 |
| Number of utterances | 11,660 | 760 |
| Duration | 14:55 | 00:45 |
| Unique phonemes | 4,617 | 1,692 |
Dataset Creation
Rationale
[Further information needed]
Source Data
Initial Collection and Normalization
[Further information needed]
Who collected the source language?
[Further information needed]
Annotation
Annotation Process
[Further information needed]
Who are the annotators?
[Further information needed]
Personal and Sensitive Information
The dataset contains voices of volunteers. Users agree not to attempt to identify the speakers.
Considerations for Using the Data
Societal Impact
[Further information needed]
Discussion of Biases
[Further information needed]
Other Known Limitations
The dataset is intended for research purposes only. See the license for more details.
Additional Information
Curators
The dataset was originally prepared by AILAB, the computer‑science lab at VNU‑HCM – University of Science.
License
Public domain, Creative Commons Attribution NonCommercial ShareAlike v4.0 (CC BY‑NC‑SA 4.0)
Citation
@inproceedings{luong-vu-2016-non,
title = "A non-expert {K}aldi recipe for {V}ietnamese Speech Recognition System",
author = "Luong, Hieu-Thi and
Vu, Hai-Quan",
booktitle = "Proceedings of the Third International Workshop on Worldwide Language Service Infrastructure and Second Workshop on Open Infrastructures and Analysis Frameworks for Human Language Technologies ({WLSI}/{OIAF}4{HLT}2016)",
month = dec,
year = "2016",
address = "Osaka, Japan",
publisher = "The COLING 2016 Organizing Committee",
url = "https://aclanthology.org/W16-5207",
pages = "51--55",
}
Contributions
Thanks to @binh234 for adding this dataset.
Pair the dataset with AI analysis and content workflows.
Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.