JUHE API Marketplace
DATASET
Open Source Community

pfb30/multi_woz_v22

The Multi‑Domain Wizard‑of‑Oz (MultiWOZ) dataset is a fully annotated collection of written human‑human dialogues spanning multiple domains and topics. Version 2.1 fixes numerous annotation errors from the original release, while version 2.2 further corrects dialogue state errors, redefines the ontology, and introduces standardized slot‑span annotations. The dataset supports tasks such as dialogue modeling, intent‑state tracking, and dialogue act prediction. It is split into training, validation, and test sets containing 8,437, 1,000, and 1,000 dialogues respectively.

Updated 1/18/2024
hugging_face

Description

Dataset Overview

Name: Multi‑domain Wizard‑of‑Oz (MultiWOZ)

Version: v2.2

Language: English (en)

License: Apache‑2.0

Multilinguality: Monolingual

Size: 10K < n < 100K

Source: Original data

Task Categories:

  • Text Generation (text-generation)
  • Fill‑Mask (fill-mask)
  • Token Classification (token-classification)
  • Text Classification (text-classification)

Specific Tasks:

  • Dialogue Modeling (dialogue-modeling)
  • Multi‑class Classification (multi-class-classification)
  • Parsing (parsing)

Dataset Information:

  • Config Name: v2.2
  • Features:
    • dialogue_id: unique identifier (string).
    • services: list of services mentioned (string sequence).
    • turns: sequence of dialogue turns, each containing:
      • turn_id: unique turn ID (string).
      • speaker: USER or SYSTEM (categorical).
      • utterance: spoken text (string).
      • frames: intent and belief state (structured).
      • dialogue_acts: dialogue acts (structured).
  • Splits:
    • train: 8,437 examples, 68,222,649 bytes.
    • validation: 1,000 examples, 8,990,945 bytes.
    • test: 1,000 examples, 9,027,095 bytes.

Dataset Size: 86,240,689 bytes

Download Size: 276,592,909 bytes

Structure

Data Instances: Complete multi‑turn dialogues with annotations per turn.

Fields: dialogue_id, services, turns (including turn_id, speaker, utterance, frames, dialogue_acts).

Splits: train, validation, test.

AI studio

Generate PPTs instantly with Nano Banana Pro.

Generate PPT Now

Access Dataset

Login to Access

Please login to view download links and access full dataset details.

Topics

Dialogue Systems
Natural Language Processing

Source

Organization: hugging_face

Created: Unknown

Power Your Data Analysis with Premium AI Models

Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.

Enjoy a free trial and save 20%+ compared to official pricing.