Back to datasets
Dataset assetOpen Source CommunityCross-Language Communication3D Talking Head Generation

MultiTalk

The MultiTalk dataset is used to enhance cross-lingual 3D talking head generation, supporting multilingual video data.

Source
github
Created
Jun 15, 2024
Updated
Jun 20, 2024
Signals
197 views
Availability
Linked source ready
Overview

Dataset description and usage context

Dataset Overview

Dataset Name

  • MultiTalk

Dataset Description

  • MultiTalk is a multilingual video dataset aimed at improving cross-lingual 3D talking head generation performance.

Dataset Access

Related Model Downloads

  • Running MultiTalk requires downloading stage1 and stage2 models, as well as the average facial template file from the FLAME topology.
  • After downloading, the models should be placed in the ./checkpoints directory.

Dataset Evaluation

  • Lip Vertex Error (LVE): evaluates lip vertex error.
  • Audio-Visual Lip Reading (AVLR): assesses lip readability, requiring a pre‑trained Audio‑Visual Speech Recognition (AVSR) model.

Dataset Training and Testing

  • Training:
    • Discrete Motion Prior: train using the command sh scripts/train_multi.sh MultiTalk_s1 config/multi/stage1.yaml multi s1.
    • Speech-Driven Motion Synthesis: train using the command sh scripts/train_multi.sh MultiTalk_s2 config/multi/stage2.yaml multi s2.
  • Testing:
    • LVE: test using the command sh scripts/test.sh MultiTalk_s2 config/multi/stage2.yaml vocaset s2.
    • AVLR: evaluate using the command python eval_avlr/eval_avlr.py --avhubert-path ./av_hubert/avhubert --work-dir ./avlr --language ${language} --model-name MultiTalk --exp-name ${exp_name}.
Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio