One-Shot Talking Face Generation from Single-Speaker Audio-Visual Correlation Learning
Authors: Suzhen Wang, Lincheng Li, Yu Ding, Xin Yu2531-2539
AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate that our synthesized videos outperform the state-of-the-art in terms of visual quality and lip-sync. ... We conduct quantitative evaluations on several metrics that have been wildly used in previous methods. ... We compare our method with recent state-of-the-art methods, including Wav2Lip (Prajwal et al. 2020), Makeit Talk (Zhou et al. 2020), Audio2Head (Wang et al. 2021), FGTF (Zhang et al. 2021c), and PC-AVS (Zhou et al. 2021). ... We conduct an ablation study with 7 variants: |
| Researcher Affiliation | Collaboration | Suzhen Wang1, Lincheng Li1, Yu Ding1*, Xin Yu2 1 Virtual Human Group, Netease Fuxi AI Lab 2 University of Technology Sydney {wangsuzhen, lilincheng, dingyu01}@corp.netease.com xin.yu@uts.edu.au |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | 1https://github.com/FuxiVirtualHuman/AAAI22-one-shot-talking-face |
| Open Datasets | Yes | we employ two in-the-wild audio-visual datasets to evaluate our method, HDTF (Zhang et al. 2021c) and Vox Celeb2 (Chung, Nagrani, and Zisserman 2018). |
| Dataset Splits | No | The paper mentions training, but does not provide specific details on how the datasets are split into training, validation, and test sets with percentages or counts. It mentions using HDTF and Vox Celeb2 for evaluation, but not their specific splits. |
| Hardware Specification | Yes | In our experiments, T is set to 24 (on RTX 3090) |
| Software Dependencies | No | The paper mentions "Py Torch" and "Adam optimizer" but does not specify their version numbers or any other software dependencies with specific versions. |
| Experiment Setup | Yes | Eavct is trained on 4 GPU for about 5 days using the batched sequential training mechanism, with an initial learning rate of 2e-5 and a weight decay of 2e-7. ... Eh is trained on a single GPU for about 12 hours with a learning rate of 1e-4. ... In our experiments, T is set to 24 (on RTX 3090), λsync,λv,λP eq and λJ eq are set to 10,1,10,10 respectively. |