DATASET

Open Source Community

AILAB-VNUHCM/vivos

VIVOS is a free Vietnamese audio corpus containing 15 hours of recordings, prepared for Vietnamese automatic speech recognition tasks. The corpus was compiled by the AILAB lab at VNU‑HCM – University of Science, aiming to attract researchers to address Vietnamese speech recognition challenges. It includes audio files, corresponding transcripts, speaker IDs, and file paths, split into training and test sets. The dataset is released under a CC BY‑NC‑SA 4.0 license for non‑commercial use.

Updated 6/14/2023

hugging_face

Description

Dataset Card VIVOS

Dataset Description

Overview

VIVOS is a free Vietnamese speech corpus comprising 15 hours of recordings for Vietnamese automatic speech recognition. The corpus was prepared by AILAB, the computer‑science lab at VNU‑HCM – University of Science, led by Prof. Vu Hai Quan.

Supported Tasks and Leaderboards

[Further information needed]

Language

Vietnamese

Dataset Structure

Data Instances

A typical entry includes the audio file path (field path) and its transcription (field sentence). Additional metadata about the speaker and paragraph containing the transcript are also provided.

{
  "speaker_id": "VIVOSSPK01",
  "path": "/home/admin/.cache/huggingface/datasets/downloads/extracted/.../VIVOSSPK01_R001.wav",
  "audio": {
    "path": "/home/admin/.cache/huggingface/datasets/downloads/extracted/.../VIVOSSPK01_R001.wav",
    "array": [
      -0.00048828, -0.00018311, -0.00137329, ..., 0.00079346, 0.00091553, 0.00085449
    ],
    "sampling_rate": 16000
  },
  "sentence": "KHÁCH SẠN"
}

Data Fields

speaker_id: speaker identifier
path: audio file path
audio: dictionary containing audio file path, decoded audio array, and sampling rate
sentence: text that the speaker was prompted to read

Data Splits

Speech material is divided into training and test sets.

	Training	Test
Number of speakers	46	19
Number of utterances	11,660	760
Duration	14:55	00:45
Unique phonemes	4,617	1,692

Dataset Creation

Rationale

[Further information needed]

Source Data

Initial Collection and Normalization

[Further information needed]

Who collected the source language?

[Further information needed]

Annotation

Annotation Process

[Further information needed]

Who are the annotators?

[Further information needed]

Personal and Sensitive Information

The dataset contains voices of volunteers. Users agree not to attempt to identify the speakers.

Considerations for Using the Data

Societal Impact

[Further information needed]

Discussion of Biases

[Further information needed]

Other Known Limitations

The dataset is intended for research purposes only. See the license for more details.

Additional Information

Curators

The dataset was originally prepared by AILAB, the computer‑science lab at VNU‑HCM – University of Science.

License

Public domain, Creative Commons Attribution NonCommercial ShareAlike v4.0 (CC BY‑NC‑SA 4.0)

Citation

@inproceedings{luong-vu-2016-non,
    title = "A non-expert {K}aldi recipe for {V}ietnamese Speech Recognition System",
    author = "Luong, Hieu-Thi  and
      Vu, Hai-Quan",
    booktitle = "Proceedings of the Third International Workshop on Worldwide Language Service Infrastructure and Second Workshop on Open Infrastructures and Analysis Frameworks for Human Language Technologies ({WLSI}/{OIAF}4{HLT}2016)",
    month = dec,
    year = "2016",
    address = "Osaka, Japan",
    publisher = "The COLING 2016 Organizing Committee",
    url = "https://aclanthology.org/W16-5207",
    pages = "51--55",
}

Contributions

Thanks to @binh234 for adding this dataset.

AI studio

Generate PPTs instantly with Nano Banana Pro.

Generate PPT Now

Access Dataset

Please login to view download links and access full dataset details.

Topics

Automatic Speech Recognition

Vietnamese

Source

Organization: hugging_face

Created: Unknown

Power Your Data Analysis with Premium AI Models

Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.

Enjoy a free trial and save 20%+ compared to official pricing.

Check Prices →