DRPose3D: Depth Ranking in 3D Human Pose Estimation

Authors: Min Wang, Xipeng Chen, Wentao Liu, Chen Qian, Liang Lin, Lizhuang Ma

IJCAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our approach outperforms the state-of-the-art methods in the Human3.6M benchmark for all three testing protocols, indicating that depth ranking is an essential geometric feature which can be learned to improve the 3D pose estimation. The proposed DRPose3D framework achieves the-state-of-the-art results on three common protocols of Human3.6M dataset compared with both end-to-end and two-stage methods [Sun et al., 2017; Fang et al., 2018; Martinez et al., 2017]. Mean per joint position errors (MPJPE) on the three protocols are decreased to 57.8mm(2.2% ), 42.9mm(6.1% ) and 62.8mm(13.7% ) respectively. And the MPJPE gap between protocol #3 and protocol #1 is reduced to 5.0mm(59.7% ). It proves that our method is robust to new camera positions and our data augmentation is very effective. We evaluate our method on Human3.6M and compare with state-of-the-arts methods. To verify impacts of each component in our approach, we also perform ablation studies.
Researcher Affiliation Collaboration 1 Department of Computer Science and Engineering, Shanghai Jiao Tong University 2 School of Data and Computer Science, Sun Yat-Sen University 3 Department of Computer Science and Technology, Tsinghua University 4 Sense Time Group Limited 5 School of Computer Science and Software Engineering, East China Normal University
Pseudocode No The paper does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any statement about making its source code publicly available or provide a link to a code repository.
Open Datasets Yes Human3.6M is currently the largest public 3D human pose benchmark. The dataset captured human poses in a laboratory environment with Motion Capture technology. It consists of 3.6 million images describing daily activities. There are 4 cameras, 11 subjects (actors) and 17 scenarios (actions) in this dataset. We use mean per joint position error (MPJPE) as evaluation metric and adopt it in three protocols described in previous works [Fang et al., 2018]. [Ionescu et al., 2014] Catalin Ionescu, Dragos Papava, Vlad Olaru, and Cristian Sminchisescu. Human3. 6m: Large scale datasets and predictive methods for 3d human sensing in natural environments. TPAMI, 36(7):1325 1339, 2014. MPII is widely used for 2D human pose estimation in the wild. We will provide qualitative evaluation on this dataset. [Andriluka et al., 2014] Mykhaylo Andriluka, Leonid Pishchulin, Peter Gehler, and Bernt Schiele. 2d human pose estimation: New benchmark and state of the art analysis. In CVPR, pages 3686 3693, 2014.
Dataset Splits No The paper specifies training and testing splits for Human3.6M using different protocols (e.g., Protocol #1 uses subjects S1, S5, S6, S7, S8 for training and S9 and S11 for testing; Protocol #3 uses 3 camera views for training and one for testing). While it mentions training epochs and learning rates for DPNet, it does not explicitly define a separate validation dataset split with percentages or counts.
Hardware Specification Yes In all experiments, the models are trained on 8 TITAN Xp GPUs with batch size 64 and the initial learning rate 0.1. With the benefit of low dimensionality, we only use one TITAN Xp GPU to train this network.
Software Dependencies No The paper mentions using optimizers like SGD and Adam, and architectures like stacked hourglass and Resnet-34, but does not provide specific version numbers for any software libraries, frameworks, or programming languages used (e.g., PyTorch, TensorFlow, Python version).
Experiment Setup Yes The variance of Gaussian is set to 4 in our experiments. We train the PRCNN model with binary cross entropy loss and use Stochastic Gradient Descent (SGD) to train 25 epochs over the whole training set. In all experiments, the models are trained on 8 TITAN Xp GPUs with batch size 64 and the initial learning rate 0.1. We set the root of 3D pose to (0,0,0) following [Martinez et al., 2017]. We train our DPNet for 400 epochs using Adam, and the initial learning rate is 0.001 with exponential decay. The mini-batches is set to 64. The probability of dropout is set to 0.3 so that it remains more possible information in rankings.