pfb30/multi_woz_v22
The Multi‑Domain Wizard‑of‑Oz (MultiWOZ) dataset is a fully annotated collection of written human‑human dialogues spanning multiple domains and topics. Version 2.1 fixes numerous annotation errors from the original release, while version 2.2 further corrects dialogue state errors, redefines the ontology, and introduces standardized slot‑span annotations. The dataset supports tasks such as dialogue modeling, intent‑state tracking, and dialogue act prediction. It is split into training, validation, and test sets containing 8,437, 1,000, and 1,000 dialogues respectively.
Description
Dataset Overview
Name: Multi‑domain Wizard‑of‑Oz (MultiWOZ)
Version: v2.2
Language: English (en)
License: Apache‑2.0
Multilinguality: Monolingual
Size: 10K < n < 100K
Source: Original data
Task Categories:
- Text Generation (
text-generation) - Fill‑Mask (
fill-mask) - Token Classification (
token-classification) - Text Classification (
text-classification)
Specific Tasks:
- Dialogue Modeling (
dialogue-modeling) - Multi‑class Classification (
multi-class-classification) - Parsing (
parsing)
Dataset Information:
- Config Name: v2.2
- Features:
dialogue_id: unique identifier (string).services: list of services mentioned (string sequence).turns: sequence of dialogue turns, each containing:turn_id: unique turn ID (string).speaker: USER or SYSTEM (categorical).utterance: spoken text (string).frames: intent and belief state (structured).dialogue_acts: dialogue acts (structured).
- Splits:
train: 8,437 examples, 68,222,649 bytes.validation: 1,000 examples, 8,990,945 bytes.test: 1,000 examples, 9,027,095 bytes.
Dataset Size: 86,240,689 bytes
Download Size: 276,592,909 bytes
Structure
Data Instances: Complete multi‑turn dialogues with annotations per turn.
Fields: dialogue_id, services, turns (including turn_id, speaker, utterance, frames, dialogue_acts).
Splits: train, validation, test.
AI studio
Generate PPTs instantly with Nano Banana Pro.
Generate PPT NowAccess Dataset
Please login to view download links and access full dataset details.
Topics
Source
Organization: hugging_face
Created: Unknown
Power Your Data Analysis with Premium AI Models
Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.
Enjoy a free trial and save 20%+ compared to official pricing.