High Quality Data

Dataset Hub

Explore high-quality datasets for your AI and machine learning projects.

Sort:

Browse by Category

AILAB-VNUHCM/vivos

VIVOS is a free Vietnamese audio corpus containing 15 hours of recordings, prepared for Vietnamese automatic speech recognition tasks. The corpus was compiled by the AILAB lab at VNU‑HCM – University of Science, aiming to attract researchers to address Vietnamese speech recognition challenges. It includes audio files, corresponding transcripts, speaker IDs, and file paths, split into training and test sets. The dataset is released under a CC BY‑NC‑SA 4.0 license for non‑commercial use.

hugging_face

View Details