Dataset assetOpen Source CommunitySpeech ProcessingSpeech Quality Assessment

MOS-Bench

MOS‑Bench is a collection of datasets for training and evaluating the generalization ability of subjective speech quality assessment (SSQA) models, developed by Nagoya University. The collection comprises seven training sets and twelve test sets, covering various sampling rates, languages, and speech types, including synthetic speech generated by text‑to‑speech (TTS), voice conversion (VC), and speech enhancement (SE) systems, as well as non‑synthetic speech such as transmission, noise, and reverberated speech. The dataset was created by integrating and processing multiple listening test datasets, aiming to address the generalization challenges of speech quality assessment models on unseen data. MOS‑Bench is widely used in speech processing, especially for research on subjective speech quality evaluation.

Source

arXiv

Created

Nov 6, 2024

Updated

Nov 6, 2024

Signals

362 views

Availability

Linked source ready

Overview

Dataset description and usage context

🗣️ SHEET / MOS-Bench 🎧

Dataset Overview

MOS-Bench is a benchmark for evaluating the generalization capability of subjective speech quality assessment (SSQA) models.
SHEET is a toolkit for conducting research experiments with MOS-Bench.

Key Features

MOS-Bench is the first large‑scale collection of SSQA training and testing datasets, covering a wide range of domains, including synthetic speech generated by text‑to‑speech (TTS), voice conversion (VC), and singing voice synthesis (SVS) systems, as well as distorted speech with artificial and real noise, clipping, transmission, reverberation, and other degradations.
The repository aims to provide training recipes. Although many ready‑made speech quality assessment tools exist, such as DNSMOS, SpeechMOS, and speechmetrics, most do not provide training recipes and are therefore unsuitable for research.

MOS-Bench Overview

MOS-Bench currently contains 7 training sets and 12 test sets.

Supported Models and Features

Models

LDNet
- Original repository: https://github.com/unilight/LDNet
- Paper: arXiv
- Example config: egs/bvcc/conf/ldnet-ml.yaml
SSL-MOS
- Original repository: https://github.com/nii-yamagishilab/mos-finetune-ssl/tree/main
- Paper: arXiv
- Example config: egs/bvcc/conf/ssl-mos-wav2vec2.yaml
- Note: Some modifications were made to the original implementation.
UTMOS (Strong learner)
- Original repository: https://github.com/sarulab-speech/UTMOS22/tree/master/strong
- Paper: arXiv
- Example config: egs/bvcc/conf/utmos-strong.yaml
- Note: Not all components of UTMOS strong are implemented.
Modified AlignNet
- Original repository: https://github.com/NTIA/alignnet
- Paper: arXiv
- Example config: egs/bvcc+nisqa+pstn+singmos+somos+tencent+tmhint-qi/conf/alignnet-wav2vec2.yaml

Features

Modeling
- Audience modeling
- SSL‑based encoder supported by S3PRL
Training
- Automatic saving of the best model and early stopping
- Visualization, including prediction score distribution and scatter plots of utterance‑ and system‑level scores
- Model averaging
- Model ensembling via stacking

Usage Guide

New users: Provides complete experiment recipes, including scripts for downloading and processing datasets, and for training and evaluating models.
Existing model users: Provides convenient scripts for collecting test sets.
Using pre‑trained models: Offers functionality to load pre‑trained SSQA models and predict scores via torch.hub.

Installation

Perform an editable installation with git clone and make, which automatically builds a virtual environment.

Information

Citation: If you use the training scripts, benchmark scripts, or pre‑trained models from this project, please cite the associated papers.
Acknowledgements: This project was inspired by repositories such as ESPNet and ParallelWaveGAN.
Authors: Wen‑Chin Huang, Toda Laboratory, Nagoya University.

Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio