Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Occlusion-Insensitive Talking Head Video Generation via Facelet Compensation

Authors: Yuhui Deng, Yuqin Lu, Yangyang Xu, Yongwei Nie, Shengfeng He

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments demonstrate that our method achieves state-of-the-art results, preserving source identity, maintaining fine-grained facial details, and capturing nuanced facial expressions with remarkable accuracy. We conduct extensive experiments on competitive face video benchmarks such as Vox Celeb1 (Nagrani, Chung, and Zisserman 2017) and Celeb V (Wu et al. 2018). Experimental results demonstrate the effectiveness of our approach in addressing extreme poses and occlusion. Furthermore, our method significantly outperforms state-of-the-art techniques, as evidenced by both qualitative and quantitative evaluations.
Researcher Affiliation Academia 1 South China University of Technology 2 Singapore Management University 3 Harbin Institute of Technology (Shenzhen)
Pseudocode No The paper describes the methodology using text and mathematical equations, but it does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code No The paper does not explicitly state that the authors are releasing their source code, nor does it provide a link to a code repository. It mentions using 'official code implementation and publicly available pre-trained models' for competitors, but not for their own method.
Open Datasets Yes In our experiments, we use two commonly used datasets for the validation of talking head generation: Voxceleb1 (Nagrani, Chung, and Zisserman 2017) and Celeb V dataset (Wu et al. 2018).
Dataset Splits Yes We follow the same data pre-processing protocol and train-test split strategy in (Siarohin et al. 2019b; Hong et al. 2022) for evaluation. This dataset consists of 1,300 image pairs, each containing a source image with self-occlusion, manually curated from the Vox Celeb1 test set.
Hardware Specification Yes Our model is trained for 100 epochs using two RTX 4090 GPUs in an end-to-end training manner, taking up to approximately 5 days in total.
Software Dependencies No The paper mentions using 'Adam optimizer (Kingma and Ba 2015)' and 'Hourglass network architecture for keypoint estimation (Newell, Yang, and Deng 2016)' but does not specify version numbers for programming languages, libraries, or frameworks.
Experiment Setup Yes Our model is trained for 100 epochs using two RTX 4090 GPUs in an end-to-end training manner, taking up to approximately 5 days in total. The Adam optimizer (Kingma and Ba 2015) is adopted with learning rate with 2e 4, β1 = 0.5 and β2 = 0.999.