Generalizable and Animatable Gaussian Head Avatar
Authors: Xuangeng Chu, Tatsuya Harada
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments show that our method exhibits superior performance compared to previous methods in terms of reconstruction quality and expression accuracy. |
| Researcher Affiliation | Academia | Xuangeng Chu The University of Tokyo xuangeng.chu@mi.t.u-tokyo.ac.jp Tatsuya Harada The University of Tokyo RIKEN AIP harada@mi.t.u-tokyo.ac.jp |
| Pseudocode | No | The paper describes the model architecture and processes but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' block. |
| Open Source Code | Yes | Code and demos are available at https://github.com/xg-chu/GAGAvatar. |
| Open Datasets | Yes | We use the VFHQ [Xie et al., 2022] dataset to train our model, which comprises clips from various interview scenarios. ... We also evaluate on HDTF [Zhang et al., 2021] dataset, following the test split used in [Ma et al., 2023, Li et al., 2023a], including 19 video clips. |
| Dataset Splits | Yes | For training videos, we uniformly sample frames based on the video s length: 25 frames if the video is less than 2 seconds, 50 frames if the video is 2 to 3 seconds, and 75 frames if the video is longer than 3 seconds. This resulted in a dataset that includes 586,382 frames from 15,204 video clips. ... For evaluation, we use sampled frames from the VFHQ original test split, consisting of 5000 frames from 100 videos. The first frame of each video serves as the source image, with the remaining frames used as driving and target images for reenactment. |
| Hardware Specification | Yes | The training process is conducted on an NVIDIA Tesla A100 GPU and takes approximately 46 GPU hours, demonstrating efficient resource utilization. During inference, our method achieves 67 FPS on an A100 GPU while using only 2.5 GB of VRAM, showcasing high efficiency. |
| Software Dependencies | No | The paper mentions 'Our framework is built on the Py Torch [Paszke et al., 2017] platform,' and 'We use DINOv2 Base as our feature extractor,' and 'Our neural renderer employs Style UNet [Wang et al., 2021b]'. However, it does not specify exact version numbers for PyTorch or other libraries/models. |
| Experiment Setup | Yes | During training, we use the ADAM [Kingma and Ba, 2014] optimizer with a learning rate of 1.0e-4. The DINOv2 [Oquab et al., 2023] backbone is frozen during training and is not trained or fine-tuned. Our training consists of 200,000 iterations with a total batch size of 8. |