JUHE API Marketplace
DATASET
Open Source Community

renumics/dcase23-task2-enriched

This is an enriched version of the DCASE 2023 Challenge Task 2 dataset, focusing on audio classification tasks such as anomaly detection, abnormal sound detection, and machine condition monitoring. The dataset augments the original MIMII DG and ToyADMOS2 data with embeddings generated by a pretrained audio spectrogram transformer and benchmark results from the official challenge, facilitating research in unsupervised learning and domain generalization.

Updated 6/6/2023
hugging_face

Description

Dataset Overview

Dataset Description

Summary

  • Name: Enriched DCASE 2023 Challenge Task 2 Dataset
  • Category: Audio Classification
  • Size: 1K < n < 10K
  • Tags: Anomaly detection, abnormal sound detection, acoustic condition monitoring, machine fault diagnosis, machine learning, unsupervised learning, acoustic scene classification, acoustic event detection, acoustic signal processing, audio domain transfer, domain generalization
  • License: CC‑BY‑4.0

Structure

Data Instances

  • Audio: Mono, 10 s duration
  • Path: Audio file path
  • Section: Integer indicating section
  • d1p: Parameter name
  • d1v: Parameter value
  • Domain: Integer (0 = source, 1 = target)
  • Class: Integer indicating machine type
  • Label: Integer (0 = normal, 1 = abnormal)
  • Anomaly Index: Integer from local outlier factor algorithm
  • Anomaly Score: Float from local outlier factor algorithm
  • Embedding: Audio embedding generated by an audio spectrogram transformer

Data Splits

  • Development Set: Train and test splits
    • Train: 7,000 instances
    • Test: 1,400 instances
  • Additional Training Set: Train only, 7,000 instances
  • Evaluation Set: Test only, 1,400 instances

Creation

Source Data

  • Includes normal and abnormal sounds from seven machine types, each providing a section with training and testing data.
  • Recordings contain machine operation sounds and environmental noise.

Supported Tasks & Leaderboard

  • Task: Abnormal sound detection for machine condition monitoring
  • Requirements: Unsupervised learning, domain generalization, training on new machine types, training on single machine type data

Considerations

Social Impact

  • To be added

Bias Discussion

  • To be added

Known Limitations

  • To be added

Additional Information

Baseline Systems

  • Baseline code is available on GitHub, providing a reasonable performance starting point for novice researchers.

License Information

  • The original data was created by Hitachi, Ltd. and NTT Corporation and is released under the Creative Commons Attribution‑NonCommercial‑ShareAlike 4.0 International (CC BY‑NC‑SA 4.0) license.

AI studio

Generate PPTs instantly with Nano Banana Pro.

Generate PPT Now

Access Dataset

Login to Access

Please login to view download links and access full dataset details.

Topics

Anomalous Sound Detection
Machine State Monitoring

Source

Organization: hugging_face

Created: Unknown

Power Your Data Analysis with Premium AI Models

Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.

Enjoy a free trial and save 20%+ compared to official pricing.