Arbitrary Talking Face Generation via Attentional Audio-Visual Coherence Learning
Authors: Hao Zhu, Huaibo Huang, Yi Li, Aihua Zheng, Ran He
IJCAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results on benchmark LRW dataset and GRID dataset transcend the state-of-the-art methods on prevalent metrics with robust high-resolution synthesizing on gender and pose variations.4 Experiments 4.1 Dataset and Metrics We evaluate our method on prevalent benchmark datasets LRW [Chung and Zisserman, 2016] and GRID [Cooke et al., 2006]. ... We use common reconstruction metrics such as PSNR and SSIM [Wang et al., 2004] to evaluate the quality of the synthesized talking faces. Furthermore, we use Landmark Distance (LMD) to evaluate the accuracy of the generated lip by calculating the landmark distance between the generated video and the original video. |
| Researcher Affiliation | Academia | Hao Zhu1,2 , Huaibo Huang2,3 , Yi Li2,3 , Aihua Zheng1 and Ran He2,3 1School of Computer Science and Technology, Anhui University, Hefei, China 2NLPR&CEBSIT&CRIPAC, Institute of Automation, CAS, Beijing, China 3School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China haozhu96@gmail.com, {huaibo.huang,yi.li}@cripac.ia.ac.cn, ahzheng214@ahu.edu.cn, rhe@nlpr.ia.ac.cn |
| Pseudocode | No | The paper describes methods and architectures but does not include any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any explicit statements about releasing source code, nor does it include links to a code repository. |
| Open Datasets | Yes | We evaluate our method on prevalent benchmark datasets LRW [Chung and Zisserman, 2016] and GRID [Cooke et al., 2006]. |
| Dataset Splits | No | The paper does not explicitly provide details about training, validation, and test dataset splits (e.g., percentages, sample counts, or specific predefined split information). |
| Hardware Specification | No | The paper does not provide specific details regarding the hardware (e.g., GPU, CPU models, or cloud instances) used for running the experiments. |
| Software Dependencies | No | The paper does not list specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | Our full model is optimized according to the following objective function: L = LGAN + λ1Lperc + λ2Llip + λ3Lmi.Specifically, in the training stage, we start from relatively high attention (rate = 0.7 0.9), and progressively decrease it to relatively low attention (rate = 0.1 0.3), then we fix the rate to 1 for the last few epochs. |