huckiyang/DiPCo
The DipCo dataset, publicly released by Amazon, aims to help speech scientists separate multiple speakers' signals in reverberant rooms. The dataset was created by simulating dinner‑party scenarios with volunteers in a lab; each session involves four participants. It includes near‑field and far‑field recordings together with detailed transcriptions for development and evaluation. The dataset is released under the CDLA‑Permissive‑1.0 license.
Description
Dataset Overview
Dataset Name
- Name: DipCo – Dinner Party Corpus
- Alias: DiPCo
Dataset Attributes
- Language: English (en)
- Task Categories: automatic‑speech‑recognition, voice‑activity‑detection
- Multilinguality: Monolingual
- Tags: speaker separation, speech recognition, microphone array processing
- License: CDLA‑Permissive‑1.0
- Size Range: 100 M < size < 100 G
- Annotation Creators: expert generated
- Language Creators: expert generated
Dataset Content
- Audio Format: WAV, 16 kHz, 16‑bit
- Recording Types:
- Near‑field (single‑channel microphone)
- Far‑field (7‑channel microphone array)
- File Naming Rules:
- Near‑field:
<session_id>_<speaker_id>.wav - Far‑field:
<session_id>_<device_id>.<channel_id>.wav
- Near‑field:
- Transcription Format: JSON
- Transcription Content: session ID, speaker ID, gender, mother tongue, language proficiency, transcript text, start time, end time, reference signal
Dataset Structure
DiPCo/
├── audio
│ ├── dev
│ └── eval
└── transcriptions
├── dev
└── eval
Session Details
- Number of Sessions: 10
- Participants per Session: 4
- Number of Devices: 5
- Channels per Device: 7
- Session Naming:
<session_id>(e.g., S01, S02, …) - Speaker Naming:
<speaker_id>(e.g., P01, P02, …) - Device Naming:
<device_id>(e.g., U01, U02, …) - Channel Naming:
<channel_id>(e.g., CH1, CH2, …)
Development & Evaluation Sets
- Development Set: Sessions S02, S04, S05, S09, S10; total 2 h 43 min, 3,691 utterances
- Evaluation Set: Sessions S01, S03, S06, S07, S08; total 2 h 36 min, 3,619 utterances
Transcription Example
{
"start_time": {
"U01": "00:02:12.79",
"U02": "00:02:12.79",
"U03": "00:02:12.79",
"U04": "00:02:12.79",
"U05": "00:02:12.79",
"close-talk": "00:02:12.79"
},
"end_time": {
"U01": "00:02:14.84",
"U02": "00:02:14.84",
"U03": "00:02:14.84",
"U04": "00:02:14.84",
"U05": "00:02:14.84",
"close-talk": "00:02:14.84"
},
"gender": "male",
"mother_tongue": "U.S. English",
"nativeness": "native",
"ref": "close-talk",
"session_id": "S02",
"speaker_id": "P05",
"words": "[noise] how do you like the food"
}
License
- Type: CDLA‑Permissive
- Details: see LICENSE file
AI studio
Generate PPTs instantly with Nano Banana Pro.
Generate PPT NowAccess Dataset
Please login to view download links and access full dataset details.
Topics
Source
Organization: hugging_face
Created: Unknown
Power Your Data Analysis with Premium AI Models
Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.
Enjoy a free trial and save 20%+ compared to official pricing.