JUHE API Marketplace
DATASET
Open Source Community

huckiyang/DiPCo

The DipCo dataset, publicly released by Amazon, aims to help speech scientists separate multiple speakers' signals in reverberant rooms. The dataset was created by simulating dinner‑party scenarios with volunteers in a lab; each session involves four participants. It includes near‑field and far‑field recordings together with detailed transcriptions for development and evaluation. The dataset is released under the CDLA‑Permissive‑1.0 license.

Updated 2/6/2024
hugging_face

Description

Dataset Overview

Dataset Name

  • Name: DipCo – Dinner Party Corpus
  • Alias: DiPCo

Dataset Attributes

  • Language: English (en)
  • Task Categories: automatic‑speech‑recognition, voice‑activity‑detection
  • Multilinguality: Monolingual
  • Tags: speaker separation, speech recognition, microphone array processing
  • License: CDLA‑Permissive‑1.0
  • Size Range: 100 M < size < 100 G
  • Annotation Creators: expert generated
  • Language Creators: expert generated

Dataset Content

  • Audio Format: WAV, 16 kHz, 16‑bit
  • Recording Types:
    • Near‑field (single‑channel microphone)
    • Far‑field (7‑channel microphone array)
  • File Naming Rules:
    • Near‑field: <session_id>_<speaker_id>.wav
    • Far‑field: <session_id>_<device_id>.<channel_id>.wav
  • Transcription Format: JSON
  • Transcription Content: session ID, speaker ID, gender, mother tongue, language proficiency, transcript text, start time, end time, reference signal

Dataset Structure

DiPCo/
├── audio
│   ├── dev
│   └── eval
└── transcriptions
    ├── dev
    └── eval

Session Details

  • Number of Sessions: 10
  • Participants per Session: 4
  • Number of Devices: 5
  • Channels per Device: 7
  • Session Naming: <session_id> (e.g., S01, S02, …)
  • Speaker Naming: <speaker_id> (e.g., P01, P02, …)
  • Device Naming: <device_id> (e.g., U01, U02, …)
  • Channel Naming: <channel_id> (e.g., CH1, CH2, …)

Development & Evaluation Sets

  • Development Set: Sessions S02, S04, S05, S09, S10; total 2 h 43 min, 3,691 utterances
  • Evaluation Set: Sessions S01, S03, S06, S07, S08; total 2 h 36 min, 3,619 utterances

Transcription Example

{
  "start_time": {
    "U01": "00:02:12.79",
    "U02": "00:02:12.79",
    "U03": "00:02:12.79",
    "U04": "00:02:12.79",
    "U05": "00:02:12.79",
    "close-talk": "00:02:12.79"
  },
  "end_time": {
    "U01": "00:02:14.84",
    "U02": "00:02:14.84",
    "U03": "00:02:14.84",
    "U04": "00:02:14.84",
    "U05": "00:02:14.84",
    "close-talk": "00:02:14.84"
  },
  "gender": "male",
  "mother_tongue": "U.S. English",
  "nativeness": "native",
  "ref": "close-talk",
  "session_id": "S02",
  "speaker_id": "P05",
  "words": "[noise] how do you like the food"
}

License

  • Type: CDLA‑Permissive
  • Details: see LICENSE file

AI studio

Generate PPTs instantly with Nano Banana Pro.

Generate PPT Now

Access Dataset

Login to Access

Please login to view download links and access full dataset details.

Topics

Speech Signal Processing
Multi‑Speaker Separation

Source

Organization: hugging_face

Created: Unknown

Power Your Data Analysis with Premium AI Models

Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.

Enjoy a free trial and save 20%+ compared to official pricing.