Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Occlusion-Insensitive Talking Head Video Generation via Facelet Compensation
Authors: Yuhui Deng, Yuqin Lu, Yangyang Xu, Yongwei Nie, Shengfeng He
AAAI 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate that our method achieves state-of-the-art results, preserving source identity, maintaining fine-grained facial details, and capturing nuanced facial expressions with remarkable accuracy. We conduct extensive experiments on competitive face video benchmarks such as Vox Celeb1 (Nagrani, Chung, and Zisserman 2017) and Celeb V (Wu et al. 2018). Experimental results demonstrate the effectiveness of our approach in addressing extreme poses and occlusion. Furthermore, our method significantly outperforms state-of-the-art techniques, as evidenced by both qualitative and quantitative evaluations. |
| Researcher Affiliation | Academia | 1 South China University of Technology 2 Singapore Management University 3 Harbin Institute of Technology (Shenzhen) |
| Pseudocode | No | The paper describes the methodology using text and mathematical equations, but it does not include any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not explicitly state that the authors are releasing their source code, nor does it provide a link to a code repository. It mentions using 'official code implementation and publicly available pre-trained models' for competitors, but not for their own method. |
| Open Datasets | Yes | In our experiments, we use two commonly used datasets for the validation of talking head generation: Voxceleb1 (Nagrani, Chung, and Zisserman 2017) and Celeb V dataset (Wu et al. 2018). |
| Dataset Splits | Yes | We follow the same data pre-processing protocol and train-test split strategy in (Siarohin et al. 2019b; Hong et al. 2022) for evaluation. This dataset consists of 1,300 image pairs, each containing a source image with self-occlusion, manually curated from the Vox Celeb1 test set. |
| Hardware Specification | Yes | Our model is trained for 100 epochs using two RTX 4090 GPUs in an end-to-end training manner, taking up to approximately 5 days in total. |
| Software Dependencies | No | The paper mentions using 'Adam optimizer (Kingma and Ba 2015)' and 'Hourglass network architecture for keypoint estimation (Newell, Yang, and Deng 2016)' but does not specify version numbers for programming languages, libraries, or frameworks. |
| Experiment Setup | Yes | Our model is trained for 100 epochs using two RTX 4090 GPUs in an end-to-end training manner, taking up to approximately 5 days in total. The Adam optimizer (Kingma and Ba 2015) is adopted with learning rate with 2e 4, β1 = 0.5 and β2 = 0.999. |