Dataset assetOpen Source CommunityArtificial IntelligenceAudio Analysis

Fhrozen/dcase22_task3

The DCASE 2022 Task 3 dataset comprises the STARSS22 dataset and synthetic SELD mixtures, collected jointly by the University of Tampere and Sony. It includes multichannel recordings and spatio‑temporal annotations for sound event detection and localization tasks. The dataset features real‑world recordings, multiple recording formats (first‑order Ambisonics and tetrahedral microphone arrays), detailed annotation procedures, and specifications. It is suitable for training and evaluating machine‑listening models for sound event detection, source localization, and joint sound event detection‑localization.

Source

hugging_face

Created

Nov 28, 2025

Updated

Oct 19, 2022

Signals

133 views

Availability

Linked source ready

Overview

Dataset description and usage context

Dataset Overview

Name

Sony‑TAu Realistic Spatial Soundscapes 2022 (STARSS22)

Description

Overview

Content: Multichannel sound‑scene recordings with temporal and spatial annotations of salient events.
Collection Sites: University of Tampere (TAU), Finland, and Sony, Japan.
Recording Formats: Two 4‑channel spatial formats – microphone array (MIC) and first‑order Ambisonics (FOA).
Purpose: Development dataset for DCASE 2022 sound event localization and detection tasks.

Recording Details

Period: September 2021 – February 2022.
Equipment: Eigenmike em32 microphone array and Ricoh Theta V 360° video recorder.
Tracking System: Optitrack Flex 13 optical tracker.
Duration: ~2 h from 70 SONY clips and ~3 h from 51 TAU clips.

Specifications

Number of Recordings: 111 clips.
Number of Rooms: 11 distinct rooms.
Sample Rate: 24 kHz.
Formats: Two 4‑channel 3‑D recording formats.
Target Event Classes: 13 categories.

Event Classes

Count: 13 categories (e.g., female speech, male speech, applause, telephone ring, laughter, etc.).

Naming Convention

File name pattern: fold[fold number]_room[room number]_mix[recording number per room].wav

Task Setup

Train‑Test Split: Predefined splits provided.
Evaluation: Models should be trained on the training split and reported on the test split.

Directory Structure

Root: Contains README.md, LICENSE.
Subfolders: foa_dev (Ambisonic) and mic_dev (microphone array) holding the respective recordings.

Usage

Applicable Tasks: Sound Event Detection (SED), Sound Source Localization, Sound Event Localization and Detection (SELD).
Guidelines: Follow the DCASE 2022 challenge instructions, using the provided train‑test splits for model development and assessment.

Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio