Back to datasets
Dataset assetOpen Source CommunityArtificial IntelligenceAudio Analysis

Fhrozen/dcase22_task3

The DCASE 2022 Task 3 dataset comprises the STARSS22 dataset and synthetic SELD mixtures, collected jointly by the University of Tampere and Sony. It includes multichannel recordings and spatio‑temporal annotations for sound event detection and localization tasks. The dataset features real‑world recordings, multiple recording formats (first‑order Ambisonics and tetrahedral microphone arrays), detailed annotation procedures, and specifications. It is suitable for training and evaluating machine‑listening models for sound event detection, source localization, and joint sound event detection‑localization.

Source
hugging_face
Created
Nov 28, 2025
Updated
Oct 19, 2022
Signals
133 views
Availability
Linked source ready
Overview

Dataset description and usage context

Dataset Overview

Name

Sony‑TAu Realistic Spatial Soundscapes 2022 (STARSS22)

Description

Overview

  • Content: Multichannel sound‑scene recordings with temporal and spatial annotations of salient events.
  • Collection Sites: University of Tampere (TAU), Finland, and Sony, Japan.
  • Recording Formats: Two 4‑channel spatial formats – microphone array (MIC) and first‑order Ambisonics (FOA).
  • Purpose: Development dataset for DCASE 2022 sound event localization and detection tasks.

Recording Details

  • Period: September 2021 – February 2022.
  • Equipment: Eigenmike em32 microphone array and Ricoh Theta V 360° video recorder.
  • Tracking System: Optitrack Flex 13 optical tracker.
  • Duration: ~2 h from 70 SONY clips and ~3 h from 51 TAU clips.

Specifications

  • Number of Recordings: 111 clips.
  • Number of Rooms: 11 distinct rooms.
  • Sample Rate: 24 kHz.
  • Formats: Two 4‑channel 3‑D recording formats.
  • Target Event Classes: 13 categories.

Event Classes

  • Count: 13 categories (e.g., female speech, male speech, applause, telephone ring, laughter, etc.).

Naming Convention

  • File name pattern: fold[fold number]_room[room number]_mix[recording number per room].wav

Task Setup

  • Train‑Test Split: Predefined splits provided.
  • Evaluation: Models should be trained on the training split and reported on the test split.

Directory Structure

  • Root: Contains README.md, LICENSE.
  • Subfolders: foa_dev (Ambisonic) and mic_dev (microphone array) holding the respective recordings.

Usage

  • Applicable Tasks: Sound Event Detection (SED), Sound Source Localization, Sound Event Localization and Detection (SELD).
  • Guidelines: Follow the DCASE 2022 challenge instructions, using the provided train‑test splits for model development and assessment.
Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio