Back to datasets
Dataset assetOpen Source CommunityCross-Language Communication3D Talking Head Generation
MultiTalk
The MultiTalk dataset is used to enhance cross-lingual 3D talking head generation, supporting multilingual video data.
Source
github
Created
Jun 15, 2024
Updated
Jun 20, 2024
Signals
197 views
Availability
Linked source ready
Overview
Dataset description and usage context
Dataset Overview
Dataset Name
- MultiTalk
Dataset Description
- MultiTalk is a multilingual video dataset aimed at improving cross-lingual 3D talking head generation performance.
Dataset Access
- For detailed acquisition and usage instructions, refer to the MultiTalk_dataset/README.md.
Related Model Downloads
- Running MultiTalk requires downloading stage1 and stage2 models, as well as the average facial template file from the FLAME topology.
- stage1 model: Download Link
- stage2 model: Download Link
- template file: Download Link
- After downloading, the models should be placed in the
./checkpointsdirectory.
Dataset Evaluation
- Lip Vertex Error (LVE): evaluates lip vertex error.
- Audio-Visual Lip Reading (AVLR): assesses lip readability, requiring a pre‑trained Audio‑Visual Speech Recognition (AVSR) model.
Dataset Training and Testing
- Training:
- Discrete Motion Prior: train using the command
sh scripts/train_multi.sh MultiTalk_s1 config/multi/stage1.yaml multi s1. - Speech-Driven Motion Synthesis: train using the command
sh scripts/train_multi.sh MultiTalk_s2 config/multi/stage2.yaml multi s2.
- Discrete Motion Prior: train using the command
- Testing:
- LVE: test using the command
sh scripts/test.sh MultiTalk_s2 config/multi/stage2.yaml vocaset s2. - AVLR: evaluate using the command
python eval_avlr/eval_avlr.py --avhubert-path ./av_hubert/avhubert --work-dir ./avlr --language ${language} --model-name MultiTalk --exp-name ${exp_name}.
- LVE: test using the command
Need downstream help?
Pair the dataset with AI analysis and content workflows.
Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.