Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
One-Shot Talking Face Generation from Single-Speaker Audio-Visual Correlation Learning
Authors: Suzhen Wang, Lincheng Li, Yu Ding, Xin Yu2531-2539
AAAI 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate that our synthesized videos outperform the state-of-the-art in terms of visual quality and lip-sync. ... We conduct quantitative evaluations on several metrics that have been wildly used in previous methods. ... We compare our method with recent state-of-the-art methods, including Wav2Lip (Prajwal et al. 2020), Makeit Talk (Zhou et al. 2020), Audio2Head (Wang et al. 2021), FGTF (Zhang et al. 2021c), and PC-AVS (Zhou et al. 2021). ... We conduct an ablation study with 7 variants: |
| Researcher Affiliation | Collaboration | Suzhen Wang1, Lincheng Li1, Yu Ding1*, Xin Yu2 1 Virtual Human Group, Netease Fuxi AI Lab 2 University of Technology Sydney EMAIL EMAIL |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | 1https://github.com/FuxiVirtualHuman/AAAI22-one-shot-talking-face |
| Open Datasets | Yes | we employ two in-the-wild audio-visual datasets to evaluate our method, HDTF (Zhang et al. 2021c) and Vox Celeb2 (Chung, Nagrani, and Zisserman 2018). |
| Dataset Splits | No | The paper mentions training, but does not provide specific details on how the datasets are split into training, validation, and test sets with percentages or counts. It mentions using HDTF and Vox Celeb2 for evaluation, but not their specific splits. |
| Hardware Specification | Yes | In our experiments, T is set to 24 (on RTX 3090) |
| Software Dependencies | No | The paper mentions "Py Torch" and "Adam optimizer" but does not specify their version numbers or any other software dependencies with specific versions. |
| Experiment Setup | Yes | Eavct is trained on 4 GPU for about 5 days using the batched sequential training mechanism, with an initial learning rate of 2e-5 and a weight decay of 2e-7. ... Eh is trained on a single GPU for about 12 hours with a learning rate of 1e-4. ... In our experiments, T is set to 24 (on RTX 3090), λsync,λv,λP eq and λJ eq are set to 10,1,10,10 respectively. |