Sequential 3D Human Pose Estimation Using Adaptive Point Cloud Sampling Strategy

Authors: Zihao Zhang, Lei Hu, Xiaoming Deng, Shihong Xia

IJCAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on the ITOP dataset and the NTURGBD dataset demonstrate that all of our contributed components are effective, and our method can achieve state-of-the-art performance.
Researcher Affiliation Academia 1Institute of Computing Technology, Chinese Academy of Sciences 2University of Chinese Academy of Sciences, 3Institute of Software, Chinese Academy of Sciences
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code Yes supplementary material https://github.com/Hmslab/Adapose
Open Datasets Yes In our experiment, we use the ITOP dataset [Haque et al., 2016] and NTU-RGBD dataset [Shahroudy et al., 2016; Liu et al., 2019a] to evaluate our method.
Dataset Splits No The paper mentions using fully labeled and weakly labeled data for training and evaluates on the 'ITOP test dataset', but it does not explicitly provide details for a separate validation dataset split.
Hardware Specification Yes The running time of our method, V2V and WSM is 50.0, 3.5 and 24.4 FPS on a single NVIDIA 2080Ti GPU.
Software Dependencies No The paper mentions components like 'Adam optimizer', 'Point Net', and 'LSTM' but does not provide specific version numbers for any software dependencies.
Experiment Setup Yes During the training process, we use Adam optimizer with a learning rate of 0.0005 which is set to decay 0.05% every 1000 iterations. The bounding box size L is [1.8, 2, 1.5]. In our experiments, we set the weights λ3D, λ2D, λconsis and λsam as 10, 0.1, 1e-3 and 1. In the point cloud sampling module, we choose ϵ = 0.025, M = 4 in the sampling center generation step and 8-nearest neighbors in the projection step. In density-based sampling module, the original point clouds are fed into five 1D convolution layers, each of which is followed by a Re LU activation layer. The output dimensions of the five convolution layers are 64, 128, 256, 512, and 128, respectively. Then we use a fully connected layer that has 512 neurons to generate the sampling centers and the weights of original point clouds.