AILAB-VNUHCM/vivos
VIVOS is a free Vietnamese audio corpus containing 15 hours of recordings, prepared for Vietnamese automatic speech recognition tasks. The corpus was compiled by the AILAB lab at VNU‑HCM – University of Science, aiming to attract researchers to address Vietnamese speech recognition challenges. It includes audio files, corresponding transcripts, speaker IDs, and file paths, split into training and test sets. The dataset is released under a CC BY‑NC‑SA 4.0 license for non‑commercial use.
Description
Dataset Card VIVOS
Dataset Description
Overview
VIVOS is a free Vietnamese speech corpus comprising 15 hours of recordings for Vietnamese automatic speech recognition. The corpus was prepared by AILAB, the computer‑science lab at VNU‑HCM – University of Science, led by Prof. Vu Hai Quan.
Supported Tasks and Leaderboards
[Further information needed]
Language
Vietnamese
Dataset Structure
Data Instances
A typical entry includes the audio file path (field path) and its transcription (field sentence). Additional metadata about the speaker and paragraph containing the transcript are also provided.
{
"speaker_id": "VIVOSSPK01",
"path": "/home/admin/.cache/huggingface/datasets/downloads/extracted/.../VIVOSSPK01_R001.wav",
"audio": {
"path": "/home/admin/.cache/huggingface/datasets/downloads/extracted/.../VIVOSSPK01_R001.wav",
"array": [
-0.00048828, -0.00018311, -0.00137329, ..., 0.00079346, 0.00091553, 0.00085449
],
"sampling_rate": 16000
},
"sentence": "KHÁCH SẠN"
}
Data Fields
speaker_id: speaker identifierpath: audio file pathaudio: dictionary containing audio file path, decoded audio array, and sampling ratesentence: text that the speaker was prompted to read
Data Splits
Speech material is divided into training and test sets.
| Training | Test | |
|---|---|---|
| Number of speakers | 46 | 19 |
| Number of utterances | 11,660 | 760 |
| Duration | 14:55 | 00:45 |
| Unique phonemes | 4,617 | 1,692 |
Dataset Creation
Rationale
[Further information needed]
Source Data
Initial Collection and Normalization
[Further information needed]
Who collected the source language?
[Further information needed]
Annotation
Annotation Process
[Further information needed]
Who are the annotators?
[Further information needed]
Personal and Sensitive Information
The dataset contains voices of volunteers. Users agree not to attempt to identify the speakers.
Considerations for Using the Data
Societal Impact
[Further information needed]
Discussion of Biases
[Further information needed]
Other Known Limitations
The dataset is intended for research purposes only. See the license for more details.
Additional Information
Curators
The dataset was originally prepared by AILAB, the computer‑science lab at VNU‑HCM – University of Science.
License
Public domain, Creative Commons Attribution NonCommercial ShareAlike v4.0 (CC BY‑NC‑SA 4.0)
Citation
@inproceedings{luong-vu-2016-non,
title = "A non-expert {K}aldi recipe for {V}ietnamese Speech Recognition System",
author = "Luong, Hieu-Thi and
Vu, Hai-Quan",
booktitle = "Proceedings of the Third International Workshop on Worldwide Language Service Infrastructure and Second Workshop on Open Infrastructures and Analysis Frameworks for Human Language Technologies ({WLSI}/{OIAF}4{HLT}2016)",
month = dec,
year = "2016",
address = "Osaka, Japan",
publisher = "The COLING 2016 Organizing Committee",
url = "https://aclanthology.org/W16-5207",
pages = "51--55",
}
Contributions
Thanks to @binh234 for adding this dataset.
AI studio
Generate PPTs instantly with Nano Banana Pro.
Generate PPT NowAccess Dataset
Please login to view download links and access full dataset details.
Topics
Source
Organization: hugging_face
Created: Unknown
Power Your Data Analysis with Premium AI Models
Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.
Enjoy a free trial and save 20%+ compared to official pricing.