reproducibilityindex.ai

One-Shot Talking Face Generation from Single-Speaker Audio-Visual Correlation Learning

Authors: Suzhen Wang, Lincheng Li, Yu Ding, Xin Yu2531-2539

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments demonstrate that our synthesized videos outperform the state-of-the-art in terms of visual quality and lip-sync. ... We conduct quantitative evaluations on several metrics that have been wildly used in previous methods. ... We compare our method with recent state-of-the-art methods, including Wav2Lip (Prajwal et al. 2020), Makeit Talk (Zhou et al. 2020), Audio2Head (Wang et al. 2021), FGTF (Zhang et al. 2021c), and PC-AVS (Zhou et al. 2021). ... We conduct an ablation study with 7 variants:
Researcher Affiliation	Collaboration	Suzhen Wang1, Lincheng Li1, Yu Ding1*, Xin Yu2 1 Virtual Human Group, Netease Fuxi AI Lab 2 University of Technology Sydney {wangsuzhen, lilincheng, dingyu01}@corp.netease.com xin.yu@uts.edu.au
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	Yes	1https://github.com/FuxiVirtualHuman/AAAI22-one-shot-talking-face
Open Datasets	Yes	we employ two in-the-wild audio-visual datasets to evaluate our method, HDTF (Zhang et al. 2021c) and Vox Celeb2 (Chung, Nagrani, and Zisserman 2018).
Dataset Splits	No	The paper mentions training, but does not provide specific details on how the datasets are split into training, validation, and test sets with percentages or counts. It mentions using HDTF and Vox Celeb2 for evaluation, but not their specific splits.
Hardware Specification	Yes	In our experiments, T is set to 24 (on RTX 3090)
Software Dependencies	No	The paper mentions "Py Torch" and "Adam optimizer" but does not specify their version numbers or any other software dependencies with specific versions.
Experiment Setup	Yes	Eavct is trained on 4 GPU for about 5 days using the batched sequential training mechanism, with an initial learning rate of 2e-5 and a weight decay of 2e-7. ... Eh is trained on a single GPU for about 12 hours with a learning rate of 1e-4. ... In our experiments, T is set to 24 (on RTX 3090), λsync,λv,λP eq and λJ eq are set to 10,1,10,10 respectively.