Arbitrary Talking Face Generation via Attentional Audio-Visual Coherence Learning

Authors: Hao Zhu, Huaibo Huang, Yi Li, Aihua Zheng, Ran He

IJCAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results on benchmark LRW dataset and GRID dataset transcend the state-of-the-art methods on prevalent metrics with robust high-resolution synthesizing on gender and pose variations.4 Experiments 4.1 Dataset and Metrics We evaluate our method on prevalent benchmark datasets LRW [Chung and Zisserman, 2016] and GRID [Cooke et al., 2006]. ... We use common reconstruction metrics such as PSNR and SSIM [Wang et al., 2004] to evaluate the quality of the synthesized talking faces. Furthermore, we use Landmark Distance (LMD) to evaluate the accuracy of the generated lip by calculating the landmark distance between the generated video and the original video.
Researcher Affiliation Academia Hao Zhu1,2 , Huaibo Huang2,3 , Yi Li2,3 , Aihua Zheng1 and Ran He2,3 1School of Computer Science and Technology, Anhui University, Hefei, China 2NLPR&CEBSIT&CRIPAC, Institute of Automation, CAS, Beijing, China 3School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China haozhu96@gmail.com, {huaibo.huang,yi.li}@cripac.ia.ac.cn, ahzheng214@ahu.edu.cn, rhe@nlpr.ia.ac.cn
Pseudocode No The paper describes methods and architectures but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any explicit statements about releasing source code, nor does it include links to a code repository.
Open Datasets Yes We evaluate our method on prevalent benchmark datasets LRW [Chung and Zisserman, 2016] and GRID [Cooke et al., 2006].
Dataset Splits No The paper does not explicitly provide details about training, validation, and test dataset splits (e.g., percentages, sample counts, or specific predefined split information).
Hardware Specification No The paper does not provide specific details regarding the hardware (e.g., GPU, CPU models, or cloud instances) used for running the experiments.
Software Dependencies No The paper does not list specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes Our full model is optimized according to the following objective function: L = LGAN + λ1Lperc + λ2Llip + λ3Lmi.Specifically, in the training stage, we start from relatively high attention (rate = 0.7 0.9), and progressively decrease it to relatively low attention (rate = 0.1 0.3), then we fix the rate to 1 for the last few epochs.