JUHE API Marketplace
DATASET
Open Source Community

MultiTalk

The MultiTalk dataset is used to enhance cross-lingual 3D talking head generation, supporting multilingual video data.

Updated 6/20/2024
github

Description

Dataset Overview

Dataset Name

  • MultiTalk

Dataset Description

  • MultiTalk is a multilingual video dataset aimed at improving cross-lingual 3D talking head generation performance.

Dataset Access

Related Model Downloads

  • Running MultiTalk requires downloading stage1 and stage2 models, as well as the average facial template file from the FLAME topology.
  • After downloading, the models should be placed in the ./checkpoints directory.

Dataset Evaluation

  • Lip Vertex Error (LVE): evaluates lip vertex error.
  • Audio-Visual Lip Reading (AVLR): assesses lip readability, requiring a pre‑trained Audio‑Visual Speech Recognition (AVSR) model.

Dataset Training and Testing

  • Training:
    • Discrete Motion Prior: train using the command sh scripts/train_multi.sh MultiTalk_s1 config/multi/stage1.yaml multi s1.
    • Speech-Driven Motion Synthesis: train using the command sh scripts/train_multi.sh MultiTalk_s2 config/multi/stage2.yaml multi s2.
  • Testing:
    • LVE: test using the command sh scripts/test.sh MultiTalk_s2 config/multi/stage2.yaml vocaset s2.
    • AVLR: evaluate using the command python eval_avlr/eval_avlr.py --avhubert-path ./av_hubert/avhubert --work-dir ./avlr --language ${language} --model-name MultiTalk --exp-name ${exp_name}.

AI studio

Generate PPTs instantly with Nano Banana Pro.

Generate PPT Now

Access Dataset

Login to Access

Please login to view download links and access full dataset details.

Topics

Cross-Language Communication
3D Talking Head Generation

Source

Organization: github

Created: 6/15/2024

Power Your Data Analysis with Premium AI Models

Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.

Enjoy a free trial and save 20%+ compared to official pricing.