Fhrozen/dcase22_task3
The DCASE 2022 Task 3 dataset comprises the STARSS22 dataset and synthetic SELD mixtures, collected jointly by the University of Tampere and Sony. It includes multichannel recordings and spatio‑temporal annotations for sound event detection and localization tasks. The dataset features real‑world recordings, multiple recording formats (first‑order Ambisonics and tetrahedral microphone arrays), detailed annotation procedures, and specifications. It is suitable for training and evaluating machine‑listening models for sound event detection, source localization, and joint sound event detection‑localization.
Description
Dataset Overview
Name
Sony‑TAu Realistic Spatial Soundscapes 2022 (STARSS22)
Description
Overview
- Content: Multichannel sound‑scene recordings with temporal and spatial annotations of salient events.
- Collection Sites: University of Tampere (TAU), Finland, and Sony, Japan.
- Recording Formats: Two 4‑channel spatial formats – microphone array (MIC) and first‑order Ambisonics (FOA).
- Purpose: Development dataset for DCASE 2022 sound event localization and detection tasks.
Recording Details
- Period: September 2021 – February 2022.
- Equipment: Eigenmike em32 microphone array and Ricoh Theta V 360° video recorder.
- Tracking System: Optitrack Flex 13 optical tracker.
- Duration: ~2 h from 70 SONY clips and ~3 h from 51 TAU clips.
Specifications
- Number of Recordings: 111 clips.
- Number of Rooms: 11 distinct rooms.
- Sample Rate: 24 kHz.
- Formats: Two 4‑channel 3‑D recording formats.
- Target Event Classes: 13 categories.
Event Classes
- Count: 13 categories (e.g., female speech, male speech, applause, telephone ring, laughter, etc.).
Naming Convention
- File name pattern:
fold[fold number]_room[room number]_mix[recording number per room].wav
Task Setup
- Train‑Test Split: Predefined splits provided.
- Evaluation: Models should be trained on the training split and reported on the test split.
Directory Structure
- Root: Contains
README.md,LICENSE. - Subfolders:
foa_dev(Ambisonic) andmic_dev(microphone array) holding the respective recordings.
Usage
- Applicable Tasks: Sound Event Detection (SED), Sound Source Localization, Sound Event Localization and Detection (SELD).
- Guidelines: Follow the DCASE 2022 challenge instructions, using the provided train‑test splits for model development and assessment.
AI studio
Generate PPTs instantly with Nano Banana Pro.
Generate PPT NowAccess Dataset
Please login to view download links and access full dataset details.
Topics
Source
Organization: hugging_face
Created: Unknown
Power Your Data Analysis with Premium AI Models
Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.
Enjoy a free trial and save 20%+ compared to official pricing.