Back to datasets
Dataset assetOpen Source CommunitySpeech Signal ProcessingMulti‑Speaker Separation
huckiyang/DiPCo
The DipCo dataset, publicly released by Amazon, aims to help speech scientists separate multiple speakers' signals in reverberant rooms. The dataset was created by simulating dinner‑party scenarios with volunteers in a lab; each session involves four participants. It includes near‑field and far‑field recordings together with detailed transcriptions for development and evaluation. The dataset is released under the CDLA‑Permissive‑1.0 license.
Source
hugging_face
Created
Nov 28, 2025
Updated
Feb 6, 2024
Signals
167 views
Availability
Linked source ready
Overview
Dataset description and usage context
Dataset Overview
Dataset Name
- Name: DipCo – Dinner Party Corpus
- Alias: DiPCo
Dataset Attributes
- Language: English (en)
- Task Categories: automatic‑speech‑recognition, voice‑activity‑detection
- Multilinguality: Monolingual
- Tags: speaker separation, speech recognition, microphone array processing
- License: CDLA‑Permissive‑1.0
- Size Range: 100 M < size < 100 G
- Annotation Creators: expert generated
- Language Creators: expert generated
Dataset Content
- Audio Format: WAV, 16 kHz, 16‑bit
- Recording Types:
- Near‑field (single‑channel microphone)
- Far‑field (7‑channel microphone array)
- File Naming Rules:
- Near‑field:
<session_id>_<speaker_id>.wav - Far‑field:
<session_id>_<device_id>.<channel_id>.wav
- Near‑field:
- Transcription Format: JSON
- Transcription Content: session ID, speaker ID, gender, mother tongue, language proficiency, transcript text, start time, end time, reference signal
Dataset Structure
DiPCo/
├── audio
│ ├── dev
│ └── eval
└── transcriptions
├── dev
└── eval
Session Details
- Number of Sessions: 10
- Participants per Session: 4
- Number of Devices: 5
- Channels per Device: 7
- Session Naming:
<session_id>(e.g., S01, S02, …) - Speaker Naming:
<speaker_id>(e.g., P01, P02, …) - Device Naming:
<device_id>(e.g., U01, U02, …) - Channel Naming:
<channel_id>(e.g., CH1, CH2, …)
Development & Evaluation Sets
- Development Set: Sessions S02, S04, S05, S09, S10; total 2 h 43 min, 3,691 utterances
- Evaluation Set: Sessions S01, S03, S06, S07, S08; total 2 h 36 min, 3,619 utterances
Transcription Example
{
"start_time": {
"U01": "00:02:12.79",
"U02": "00:02:12.79",
"U03": "00:02:12.79",
"U04": "00:02:12.79",
"U05": "00:02:12.79",
"close-talk": "00:02:12.79"
},
"end_time": {
"U01": "00:02:14.84",
"U02": "00:02:14.84",
"U03": "00:02:14.84",
"U04": "00:02:14.84",
"U05": "00:02:14.84",
"close-talk": "00:02:14.84"
},
"gender": "male",
"mother_tongue": "U.S. English",
"nativeness": "native",
"ref": "close-talk",
"session_id": "S02",
"speaker_id": "P05",
"words": "[noise] how do you like the food"
}
License
- Type: CDLA‑Permissive
- Details: see LICENSE file
Need downstream help?
Pair the dataset with AI analysis and content workflows.
Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.