Back to datasets
Dataset assetOpen Source CommunityHuman‑Computer InteractionEmotion and Intent Analysis

MC-EIU

The MC‑EIU dataset, created by Inner Mongolia University and partner institutions, is a comprehensive multimodal dialogue dataset for joint emotion and intent understanding. It contains 4,970 dialogue video clips (56,012 utterances) covering 7 emotions and 9 intents, supporting text, acoustic, and visual modalities in both English and Mandarin. The dataset was built through data collection, preprocessing, and multi‑round annotation to ensure quality and diversity. MC‑EIU is aimed at human‑computer interaction research, enhancing machine understanding of human needs and empathy in conversational systems.

Source
arXiv
Created
Jul 3, 2024
Updated
Jul 4, 2024
Signals
1,131 views
Availability
Linked source ready
Overview

Dataset description and usage context

MC‑EIU Dataset Analysis

Dataset Download

  • Baidu Cloud Link: Link
  • Extraction Code: Obtain after paper acceptance via email to the authors.

Dataset Analysis

Data Visualization

  • Figure 1: Visualisation of the correlation between emotions and intents in the MC‑EIU dataset. Each circle represents the sample count for a specific "emotion‑intent" pair; larger circles indicate more samples and stronger correlation.

Correlation Analysis

  • Datasets: MC‑EIU‑English and MC‑EIU‑Mandarin
  • Matrix Representation: Two 7 × 9 matrices where each element indicates the sample count for an "emotion‑intent" pair.
  • Visualization Method: Circle radius proportional to sample count, plotted at the corresponding matrix position.

Observations

  • Emotion‑Intent Relationship: Not strictly one‑to‑one. Different intents affect specific emotions to varying degrees and vice versa.
    • For example, "Hap‑Sym" appears less frequently than "Hap‑Agr", suggesting that the "Agreeing" intent more often drives the expression of happiness.
  • Dataset Differences: The English subset shows more complex emotion‑intent correlations than the Mandarin subset.
    • For instance, the "Sur" emotion is linked to all intent categories in the English data, while in Mandarin it is associated with only six intents ("Que", "Agr", "Con", "Sug", "Wis", and "Neu").
  • Model Performance: Because of this complexity, models perform slightly worse on the English dataset compared with the Mandarin dataset.
Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio