Towards Robust and Expressive Whole-body Human Pose and Shape Estimation
Authors: Hui En Pang, Zhongang Cai, Lei Yang, Qingyi Tao, Zhonghua Wu, Tianwei Zhang, Ziwei Liu
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We perform comprehensive experiments to demonstrate the effectiveness of Robo SMPLX on body, hands, face and whole-body benchmarks. Codebase is available at https://github.com/robosmplx/robosmplx. and 5 Experiments |
| Researcher Affiliation | Collaboration | Hui En Pang1, Zhongang Cai1,2, Lei Yang2, Qingyi Tao2, Zhonghua Wu2, Tianwei Zhang1, Ziwei Liu1 1S-Lab, Nanyang Technological University 2Sense Time Research |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Codebase is available at https://github.com/robosmplx/robosmplx. |
| Open Datasets | Yes | For whole-body training, we employ Human3.6M (H36M) [13], COCO-Wholebody [14] (the whole-body version of MSCOCO [29]) and MPII [1]. The 3D pseudo-ground truths for training are acquired using Neural Annot [36]. |
| Dataset Splits | No | The paper lists datasets used for training and evaluation but does not explicitly specify the training/validation/test split percentages, sample counts, or the methodology for creating these splits within the main text. |
| Hardware Specification | No | The paper does not specify the exact hardware (e.g., GPU/CPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper does not provide specific version numbers for software dependencies or libraries used in the implementation. |
| Experiment Setup | Yes | Subnetworks are trained separately, then integrated in a multi-stage manner. Initial whole-body training runs for 20 epochs. The hand and face modules are substituted with the trained Hand and Face subnetworks, followed by 20 epochs of fine-tuning to better unify the knowledge from the Hand and Face subnetworks into the whole-body understanding. Each subnetwork is trained by minimizing the following loss function L: L = λ3DL3D + λ2DL2D + λBMLBM + λproj Lproj + λsegm Lsegm + λcon Lcon (1) |