JUHE API Marketplace
DATASET
Open Source Community

MOS-Bench

MOS‑Bench is a collection of datasets for training and evaluating the generalization ability of subjective speech quality assessment (SSQA) models, developed by Nagoya University. The collection comprises seven training sets and twelve test sets, covering various sampling rates, languages, and speech types, including synthetic speech generated by text‑to‑speech (TTS), voice conversion (VC), and speech enhancement (SE) systems, as well as non‑synthetic speech such as transmission, noise, and reverberated speech. The dataset was created by integrating and processing multiple listening test datasets, aiming to address the generalization challenges of speech quality assessment models on unseen data. MOS‑Bench is widely used in speech processing, especially for research on subjective speech quality evaluation.

Updated 11/6/2024
arXiv

Description

🗣️ SHEET / MOS-Bench 🎧

Dataset Overview

  • MOS-Bench is a benchmark for evaluating the generalization capability of subjective speech quality assessment (SSQA) models.
  • SHEET is a toolkit for conducting research experiments with MOS-Bench.

Key Features

  • MOS-Bench is the first large‑scale collection of SSQA training and testing datasets, covering a wide range of domains, including synthetic speech generated by text‑to‑speech (TTS), voice conversion (VC), and singing voice synthesis (SVS) systems, as well as distorted speech with artificial and real noise, clipping, transmission, reverberation, and other degradations.
  • The repository aims to provide training recipes. Although many ready‑made speech quality assessment tools exist, such as DNSMOS, SpeechMOS, and speechmetrics, most do not provide training recipes and are therefore unsuitable for research.

MOS-Bench Overview

  • MOS-Bench currently contains 7 training sets and 12 test sets.

Supported Models and Features

Models

Features

  • Modeling
    • Audience modeling
    • SSL‑based encoder supported by S3PRL
  • Training
    • Automatic saving of the best model and early stopping
    • Visualization, including prediction score distribution and scatter plots of utterance‑ and system‑level scores
    • Model averaging
    • Model ensembling via stacking

Usage Guide

  • New users: Provides complete experiment recipes, including scripts for downloading and processing datasets, and for training and evaluating models.
  • Existing model users: Provides convenient scripts for collecting test sets.
  • Using pre‑trained models: Offers functionality to load pre‑trained SSQA models and predict scores via torch.hub.

Installation

  • Perform an editable installation with git clone and make, which automatically builds a virtual environment.

Information

  • Citation: If you use the training scripts, benchmark scripts, or pre‑trained models from this project, please cite the associated papers.
  • Acknowledgements: This project was inspired by repositories such as ESPNet and ParallelWaveGAN.
  • Authors: Wen‑Chin Huang, Toda Laboratory, Nagoya University.

AI studio

Generate PPTs instantly with Nano Banana Pro.

Generate PPT Now

Access Dataset

Login to Access

Please login to view download links and access full dataset details.

Topics

Speech Processing
Speech Quality Assessment

Source

Organization: arXiv

Created: 11/6/2024

Power Your Data Analysis with Premium AI Models

Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.

Enjoy a free trial and save 20%+ compared to official pricing.