Dataset assetOpen Source CommunityHuman‑Computer InteractionEmotion and Intent Analysis

MC-EIU

The MC‑EIU dataset, created by Inner Mongolia University and partner institutions, is a comprehensive multimodal dialogue dataset for joint emotion and intent understanding. It contains 4,970 dialogue video clips (56,012 utterances) covering 7 emotions and 9 intents, supporting text, acoustic, and visual modalities in both English and Mandarin. The dataset was built through data collection, preprocessing, and multi‑round annotation to ensure quality and diversity. MC‑EIU is aimed at human‑computer interaction research, enhancing machine understanding of human needs and empathy in conversational systems.

Source

arXiv

Created

Jul 3, 2024

Updated

Jul 4, 2024

Signals

1,131 views

Availability

Linked source ready

Overview

Dataset description and usage context

MC‑EIU Dataset Analysis

Dataset Download

Baidu Cloud Link: Link
Extraction Code: Obtain after paper acceptance via email to the authors.

Dataset Analysis

Data Visualization

Figure 1: Visualisation of the correlation between emotions and intents in the MC‑EIU dataset. Each circle represents the sample count for a specific "emotion‑intent" pair; larger circles indicate more samples and stronger correlation.

Correlation Analysis

Datasets: MC‑EIU‑English and MC‑EIU‑Mandarin
Matrix Representation: Two 7 × 9 matrices where each element indicates the sample count for an "emotion‑intent" pair.
Visualization Method: Circle radius proportional to sample count, plotted at the corresponding matrix position.

Observations

Emotion‑Intent Relationship: Not strictly one‑to‑one. Different intents affect specific emotions to varying degrees and vice versa.
- For example, "Hap‑Sym" appears less frequently than "Hap‑Agr", suggesting that the "Agreeing" intent more often drives the expression of happiness.
Dataset Differences: The English subset shows more complex emotion‑intent correlations than the Mandarin subset.
- For instance, the "Sur" emotion is linked to all intent categories in the English data, while in Mandarin it is associated with only six intents ("Que", "Agr", "Con", "Sug", "Wis", and "Neu").
Model Performance: Because of this complexity, models perform slightly worse on the English dataset compared with the Mandarin dataset.

Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio