Neural Localizer Fields for Continuous 3D Human Pose and Shape Estimation

Authors: István Sárándi, Gerard Pons-Moll

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We extensively evaluate our method on a variety of benchmarks: 3DPW [114] and EMDB [47] for SMPL body, AGORA [85] and EHF [87] for SMPL-X, SSP-3D [96] for SMPL focusing on body shape, as well as Human3.6M [40], MPI-INF-3DHP [70] and Mu Po TS-3D [72] for 3D skeletons.
Researcher Affiliation Academia István Sárándi,1,2 Gerard Pons-Moll1,2,3 1University of Tübingen, Germany, 2Tübingen AI Center, Germany, 3Max Planck Institute for Informatics, Saarland Informatics Campus, Germany
Pseudocode Yes In Algorithm 1, we provide the simplified pseudocode for our body model fitting algorithm used in the main paper.
Open Source Code Yes We will make our code and trained models publicly available for research.
Open Datasets Yes We extensively evaluate our method on a variety of benchmarks: 3DPW [114] and EMDB [47] for SMPL body, AGORA [85] and EHF [87] for SMPL-X, SSP-3D [96] for SMPL focusing on body shape, as well as Human3.6M [40], MPI-INF-3DHP [70] and Mu Po TS-3D [72] for 3D skeletons.
Dataset Splits No The paper mentions using test sets from various benchmarks (e.g., 3DPW, SSP-3D, AGORA) for evaluation, but it does not specify explicit training/validation splits (e.g., percentages or counts for a separate validation set) for its combined meta-dataset used during training. It describes mixed-batch training on a combination of datasets but not a dedicated validation split.
Hardware Specification Yes Training the S model takes 2 days on two 40 GB A100 GPUs, while the L takes 4 days on 8 A100s. NLF-S has a batched throughput of 410 fps and unbatched throughput of 79 fps on an Nvidia RTX 3090 GPU.
Software Dependencies No The paper mentions several software components like Efficient Net V2-S and L [106], Adam W [66], YOLOv8 [42], Blender, and SMPLitex [14]. However, it does not provide specific version numbers for these software components, which is necessary for full reproducibility of the environment.
Experiment Setup Yes We use Efficient Net V2-S (256 px) and L (384 px) [106] initialized from [93], and train with Adam W [66], linear warmup and exponential learning rate decay for 300k steps. Training the S model takes 2 days on two 40 GB A100 GPUs, while the L takes 4 days on 8 A100s. We use random rotation, scaling, translation, truncation, color distortion, synthetic occlusion, random erasing and JPEG compression for data augmentation during training.