Back to datasets
Dataset assetOpen Source CommunitySpeech Signal ProcessingMulti‑Speaker Separation

huckiyang/DiPCo

The DipCo dataset, publicly released by Amazon, aims to help speech scientists separate multiple speakers' signals in reverberant rooms. The dataset was created by simulating dinner‑party scenarios with volunteers in a lab; each session involves four participants. It includes near‑field and far‑field recordings together with detailed transcriptions for development and evaluation. The dataset is released under the CDLA‑Permissive‑1.0 license.

Source
hugging_face
Created
Nov 28, 2025
Updated
Feb 6, 2024
Signals
167 views
Availability
Linked source ready
Overview

Dataset description and usage context

Dataset Overview

Dataset Name

  • Name: DipCo – Dinner Party Corpus
  • Alias: DiPCo

Dataset Attributes

  • Language: English (en)
  • Task Categories: automatic‑speech‑recognition, voice‑activity‑detection
  • Multilinguality: Monolingual
  • Tags: speaker separation, speech recognition, microphone array processing
  • License: CDLA‑Permissive‑1.0
  • Size Range: 100 M < size < 100 G
  • Annotation Creators: expert generated
  • Language Creators: expert generated

Dataset Content

  • Audio Format: WAV, 16 kHz, 16‑bit
  • Recording Types:
    • Near‑field (single‑channel microphone)
    • Far‑field (7‑channel microphone array)
  • File Naming Rules:
    • Near‑field: <session_id>_<speaker_id>.wav
    • Far‑field: <session_id>_<device_id>.<channel_id>.wav
  • Transcription Format: JSON
  • Transcription Content: session ID, speaker ID, gender, mother tongue, language proficiency, transcript text, start time, end time, reference signal

Dataset Structure

DiPCo/
├── audio
│   ├── dev
│   └── eval
└── transcriptions
    ├── dev
    └── eval

Session Details

  • Number of Sessions: 10
  • Participants per Session: 4
  • Number of Devices: 5
  • Channels per Device: 7
  • Session Naming: <session_id> (e.g., S01, S02, …)
  • Speaker Naming: <speaker_id> (e.g., P01, P02, …)
  • Device Naming: <device_id> (e.g., U01, U02, …)
  • Channel Naming: <channel_id> (e.g., CH1, CH2, …)

Development & Evaluation Sets

  • Development Set: Sessions S02, S04, S05, S09, S10; total 2 h 43 min, 3,691 utterances
  • Evaluation Set: Sessions S01, S03, S06, S07, S08; total 2 h 36 min, 3,619 utterances

Transcription Example

{
  "start_time": {
    "U01": "00:02:12.79",
    "U02": "00:02:12.79",
    "U03": "00:02:12.79",
    "U04": "00:02:12.79",
    "U05": "00:02:12.79",
    "close-talk": "00:02:12.79"
  },
  "end_time": {
    "U01": "00:02:14.84",
    "U02": "00:02:14.84",
    "U03": "00:02:14.84",
    "U04": "00:02:14.84",
    "U05": "00:02:14.84",
    "close-talk": "00:02:14.84"
  },
  "gender": "male",
  "mother_tongue": "U.S. English",
  "nativeness": "native",
  "ref": "close-talk",
  "session_id": "S02",
  "speaker_id": "P05",
  "words": "[noise] how do you like the food"
}

License

  • Type: CDLA‑Permissive
  • Details: see LICENSE file
Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio