Fhrozen/dcase22_task3
The DCASE 2022 Task 3 dataset comprises the STARSS22 dataset and synthetic SELD mixtures, collected jointly by the University of Tampere and Sony. It includes multichannel recordings and spatio‑temporal annotations for sound event detection and localization tasks. The dataset features real‑world recordings, multiple recording formats (first‑order Ambisonics and tetrahedral microphone arrays), detailed annotation procedures, and specifications. It is suitable for training and evaluating machine‑listening models for sound event detection, source localization, and joint sound event detection‑localization.
Dataset description and usage context
Dataset Overview
Name
Sony‑TAu Realistic Spatial Soundscapes 2022 (STARSS22)
Description
Overview
- Content: Multichannel sound‑scene recordings with temporal and spatial annotations of salient events.
- Collection Sites: University of Tampere (TAU), Finland, and Sony, Japan.
- Recording Formats: Two 4‑channel spatial formats – microphone array (MIC) and first‑order Ambisonics (FOA).
- Purpose: Development dataset for DCASE 2022 sound event localization and detection tasks.
Recording Details
- Period: September 2021 – February 2022.
- Equipment: Eigenmike em32 microphone array and Ricoh Theta V 360° video recorder.
- Tracking System: Optitrack Flex 13 optical tracker.
- Duration: ~2 h from 70 SONY clips and ~3 h from 51 TAU clips.
Specifications
- Number of Recordings: 111 clips.
- Number of Rooms: 11 distinct rooms.
- Sample Rate: 24 kHz.
- Formats: Two 4‑channel 3‑D recording formats.
- Target Event Classes: 13 categories.
Event Classes
- Count: 13 categories (e.g., female speech, male speech, applause, telephone ring, laughter, etc.).
Naming Convention
- File name pattern:
fold[fold number]_room[room number]_mix[recording number per room].wav
Task Setup
- Train‑Test Split: Predefined splits provided.
- Evaluation: Models should be trained on the training split and reported on the test split.
Directory Structure
- Root: Contains
README.md,LICENSE. - Subfolders:
foa_dev(Ambisonic) andmic_dev(microphone array) holding the respective recordings.
Usage
- Applicable Tasks: Sound Event Detection (SED), Sound Source Localization, Sound Event Localization and Detection (SELD).
- Guidelines: Follow the DCASE 2022 challenge instructions, using the provided train‑test splits for model development and assessment.
Pair the dataset with AI analysis and content workflows.
Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.