HifiHead: One-Shot High Fidelity Neural Head Synthesis with 3D Control

Authors: Feida Zhu, Junwei Zhu, Wenqing Chu, Ying Tai, Zhifeng Xie, Xiaoming Huang, Chengjie Wang

IJCAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments show that our method blends source appearance and target motion more accurately along with more realistic results than previous state-of-the-art approaches.
Researcher Affiliation Collaboration Feida Zhu1 , Junwei Zhu1 , Wenqing Chu1 , Ying Tai1, , Zhifeng Xie2 , Xiaoming Huang1 and Chengjie Wang1, 1Youtu Lab, Tencent 2Shanghai Film Academy, Shanghai University
Pseudocode No No pseudocode or algorithm blocks were found.
Open Source Code Yes https: //github.com/Tencent Youtu Research/Head Synthesis-Hifi Head
Open Datasets Yes We utilize the Vox Celeb [Nagrani et al., 2017] dataset, which consists of around 20K talkinghead videos, to train our Hi Fi Head network. Besides Voxceleb, we randomly sample 1K images from the FFHQ [Karras et al., 2019] and Celeba HQ [Karras et al., 2017] dataset as the source images to compare the generalization capability with other methods.
Dataset Splits No The paper states 'A total of 17, 927 training videos and 491 testing videos are obtained.' for Vox Celeb, but does not provide details on validation splits or specific percentages for any dataset.
Hardware Specification Yes It takes around 2 days to train Hifi Head with 8 Tesla V100 GPUs.
Software Dependencies No The paper mentions using Style GAN2 and D3DFR but does not provide specific version numbers for these or other software dependencies.
Experiment Setup Yes The mapping network depth is 3 as [Karras et al., 2021]. The spatial feature encoder contains 7 down-sample convolutional layers. The learning rate is set to 0.001 for all trainable parameters. The batch size is set to 32. In a mini-batch, the ratio of same-identity data and crossidentity data is empirically set to 1 : 1.