reproducibilityindex.ai

Expressive Gaussian Human Avatars from Monocular RGB Video

Authors: Hezhen Hu, Zhiwen Fan, Tianhao Wu, Yihan Xi, Seoyoung Lee, Georgios Pavlakos, Zhangyang "Atlas" Wang

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on two benchmarks demonstrate the superiority of our approach both quantitatively and qualitatively, especially on the fine-grained hand and facial details.
Researcher Affiliation	Academia	Hezhen Hu1 Zhiwen Fan1 Tianhao Wu2 Yihan Xi1 Seoyoung Lee1 Georgios Pavlakos1 Zhangyang Wang1 1 University of Texas at Austin 2 University of Cambridge
Pseudocode	No	The paper describes the technical approach using mathematical equations and prose, but does not include structured pseudocode or algorithm blocks.
Open Source Code	Yes	We make our code available at the project website: https: //evahuman.github.io.
Open Datasets	Yes	The experiments are conducted on two datasets, XHumans [35] and our collected UPB dataset.
Dataset Splits	No	For each identity in XHumans dataset, one video is selected as the training split, where the other videos are marked as testing. During training, we utilize all of the frames (150 frames). We sample 20 frames for each testing video with the sampling rate of 5. For UPB dataset, we uniformly sample the frames with the interval as 1 to split the training and testing frames. The number of training and testing frames are both 140.
Hardware Specification	Yes	Our framework is implemented on Py Torch and all experiments are performed on NVIDIA A5000.
Software Dependencies	No	The paper mentions that the framework is implemented on PyTorch but does not specify its version number or any other software dependencies with specific versioning information.
Experiment Setup	Yes	The hyperparameter λm, λs and λl are set to 0.1, 0.01 and 0.04, respectively. We use SGHM [4] to extract the human mask. The 3D Gaussian optimization lasts for 2,000 iterations with the densification performed between iteration 400 and 1,000. For other parameters, we follow the original settings of [19]. µ is set to 1. λt is set as -9.0, -4.5 and -6.3 for body, hand, and face parts, respectively. We set e for body, hand, and face parts as 2e-4, 1e-4 and 1.4e-4, respectively. The feedback module E( ) consists of two 2D convolutional networks. For SMPL-X alignment, we utilize the L-BFGS optimizer with the Wolfe line search.