pfb30/multi_woz_v22
The Multi‑Domain Wizard‑of‑Oz (MultiWOZ) dataset is a fully annotated collection of written human‑human dialogues spanning multiple domains and topics. Version 2.1 fixes numerous annotation errors from the original release, while version 2.2 further corrects dialogue state errors, redefines the ontology, and introduces standardized slot‑span annotations. The dataset supports tasks such as dialogue modeling, intent‑state tracking, and dialogue act prediction. It is split into training, validation, and test sets containing 8,437, 1,000, and 1,000 dialogues respectively.
Dataset description and usage context
Dataset Overview
Name: Multi‑domain Wizard‑of‑Oz (MultiWOZ)
Version: v2.2
Language: English (en)
License: Apache‑2.0
Multilinguality: Monolingual
Size: 10K < n < 100K
Source: Original data
Task Categories:
- Text Generation (
text-generation) - Fill‑Mask (
fill-mask) - Token Classification (
token-classification) - Text Classification (
text-classification)
Specific Tasks:
- Dialogue Modeling (
dialogue-modeling) - Multi‑class Classification (
multi-class-classification) - Parsing (
parsing)
Dataset Information:
- Config Name: v2.2
- Features:
dialogue_id: unique identifier (string).services: list of services mentioned (string sequence).turns: sequence of dialogue turns, each containing:turn_id: unique turn ID (string).speaker: USER or SYSTEM (categorical).utterance: spoken text (string).frames: intent and belief state (structured).dialogue_acts: dialogue acts (structured).
- Splits:
train: 8,437 examples, 68,222,649 bytes.validation: 1,000 examples, 8,990,945 bytes.test: 1,000 examples, 9,027,095 bytes.
Dataset Size: 86,240,689 bytes
Download Size: 276,592,909 bytes
Structure
Data Instances: Complete multi‑turn dialogues with annotations per turn.
Fields: dialogue_id, services, turns (including turn_id, speaker, utterance, frames, dialogue_acts).
Splits: train, validation, test.
Pair the dataset with AI analysis and content workflows.
Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.