Datasets | JuheAPI

MM-Conv

Virtual Humans

Multimodal Dialogue

MM-Conv is a multimodal conversational dataset for virtual humans, created by the Royal Institute of Technology, Sweden. The dataset records dialogues between participants in the AI2-THOR physical simulator using a VR headset, comprising 6.7 hours of synchronized speech, motion capture, facial expressions, and gaze data. The creation process integrates virtual reality and motion capture technologies to ensure richness and structure. This dataset primarily supports the enhancement of gesture generation models in 3D scenes, aiming to address how to generate gestures more naturally and understand spatial information in task-oriented scenarios.

arXiv

View Details

Dataset Hub

Browse by Category

IVLLab/MultiDialog

MM-Conv