Generalizable and Animatable Gaussian Head Avatar

Authors: Xuangeng Chu, Tatsuya Harada

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments show that our method exhibits superior performance compared to previous methods in terms of reconstruction quality and expression accuracy.
Researcher Affiliation Academia Xuangeng Chu The University of Tokyo xuangeng.chu@mi.t.u-tokyo.ac.jp Tatsuya Harada The University of Tokyo RIKEN AIP harada@mi.t.u-tokyo.ac.jp
Pseudocode No The paper describes the model architecture and processes but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' block.
Open Source Code Yes Code and demos are available at https://github.com/xg-chu/GAGAvatar.
Open Datasets Yes We use the VFHQ [Xie et al., 2022] dataset to train our model, which comprises clips from various interview scenarios. ... We also evaluate on HDTF [Zhang et al., 2021] dataset, following the test split used in [Ma et al., 2023, Li et al., 2023a], including 19 video clips.
Dataset Splits Yes For training videos, we uniformly sample frames based on the video s length: 25 frames if the video is less than 2 seconds, 50 frames if the video is 2 to 3 seconds, and 75 frames if the video is longer than 3 seconds. This resulted in a dataset that includes 586,382 frames from 15,204 video clips. ... For evaluation, we use sampled frames from the VFHQ original test split, consisting of 5000 frames from 100 videos. The first frame of each video serves as the source image, with the remaining frames used as driving and target images for reenactment.
Hardware Specification Yes The training process is conducted on an NVIDIA Tesla A100 GPU and takes approximately 46 GPU hours, demonstrating efficient resource utilization. During inference, our method achieves 67 FPS on an A100 GPU while using only 2.5 GB of VRAM, showcasing high efficiency.
Software Dependencies No The paper mentions 'Our framework is built on the Py Torch [Paszke et al., 2017] platform,' and 'We use DINOv2 Base as our feature extractor,' and 'Our neural renderer employs Style UNet [Wang et al., 2021b]'. However, it does not specify exact version numbers for PyTorch or other libraries/models.
Experiment Setup Yes During training, we use the ADAM [Kingma and Ba, 2014] optimizer with a learning rate of 1.0e-4. The DINOv2 [Oquab et al., 2023] backbone is frozen during training and is not trained or fine-tuned. Our training consists of 200,000 iterations with a total batch size of 8.