Explore high-quality datasets for your AI and machine learning projects.
MM-Conv is a multimodal conversational dataset for virtual humans, created by the Royal Institute of Technology, Sweden. The dataset records dialogues between participants in the AI2-THOR physical simulator using a VR headset, comprising 6.7 hours of synchronized speech, motion capture, facial expressions, and gaze data. The creation process integrates virtual reality and motion capture technologies to ensure richness and structure. This dataset primarily supports the enhancement of gesture generation models in 3D scenes, aiming to address how to generate gestures more naturally and understand spatial information in task-oriented scenarios.