Expressive Gaussian Human Avatars from Monocular RGB Video

Authors: Hezhen Hu, Zhiwen Fan, Tianhao Wu, Yihan Xi, Seoyoung Lee, Georgios Pavlakos, Zhangyang "Atlas" Wang

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on two benchmarks demonstrate the superiority of our approach both quantitatively and qualitatively, especially on the fine-grained hand and facial details.
Researcher Affiliation Academia Hezhen Hu1 Zhiwen Fan1 Tianhao Wu2 Yihan Xi1 Seoyoung Lee1 Georgios Pavlakos1 Zhangyang Wang1 1 University of Texas at Austin 2 University of Cambridge
Pseudocode No The paper describes the technical approach using mathematical equations and prose, but does not include structured pseudocode or algorithm blocks.
Open Source Code Yes We make our code available at the project website: https: //evahuman.github.io.
Open Datasets Yes The experiments are conducted on two datasets, XHumans [35] and our collected UPB dataset.
Dataset Splits No For each identity in XHumans dataset, one video is selected as the training split, where the other videos are marked as testing. During training, we utilize all of the frames (150 frames). We sample 20 frames for each testing video with the sampling rate of 5. For UPB dataset, we uniformly sample the frames with the interval as 1 to split the training and testing frames. The number of training and testing frames are both 140.
Hardware Specification Yes Our framework is implemented on Py Torch and all experiments are performed on NVIDIA A5000.
Software Dependencies No The paper mentions that the framework is implemented on PyTorch but does not specify its version number or any other software dependencies with specific versioning information.
Experiment Setup Yes The hyperparameter λm, λs and λl are set to 0.1, 0.01 and 0.04, respectively. We use SGHM [4] to extract the human mask. The 3D Gaussian optimization lasts for 2,000 iterations with the densification performed between iteration 400 and 1,000. For other parameters, we follow the original settings of [19]. µ is set to 1. λt is set as -9.0, -4.5 and -6.3 for body, hand, and face parts, respectively. We set e for body, hand, and face parts as 2e-4, 1e-4 and 1.4e-4, respectively. The feedback module E( ) consists of two 2D convolutional networks. For SMPL-X alignment, we utilize the L-BFGS optimizer with the Wolfe line search.