Efficient Virtual View Selection for 3D Hand Pose Estimation
Authors: Jian Cheng, Yanguang Wan, Dexin Zuo, Cuixia Ma, Jian Gu, Ping Tan, Hongan Wang, Xiaoming Deng, Yinda Zhang419-426
AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on three main benchmark datasets including NYU, ICVL and Hands2019 demonstrate that our method outperforms the state-of-the-arts on NYU and ICVL, and achieves very competitive performance on Hands2019-Task1, and our proposed virtual view selection and fusion module is both effective for 3D hand pose estimation. |
| Researcher Affiliation | Collaboration | 1Beijing Key Lab of HCI, Institute of Software, Chinese Academy of Sciences 2University of Chinese Academy of Sciences 3Alibaba 4Simon Fraser University 5Google |
| Pseudocode | No | The paper does not contain any pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | Yes | The code is available in project webpage https://github.com/ iscas3dv/handpose-virtualview. |
| Open Datasets | Yes | NYU Hand Pose Dataset (NYU) (Tompson et al. 2014) contains 72,757 frames for training and 8,252 frames for testing. 36 hand joints are annotated, but we use only a subset of 14 hand joints for evaluations following the same evaluation protocol in (Tompson et al. 2014). ICVL Hand Pose Dataset (ICVL) (Tang et al. 2014) contains 331,006 frames for training and 1,596 frames for testing. 16 hand joints are annotated. Task 1 of Hands19 Challenge Dataset (Hands19-Task1) (Armagan et al. 2020) contains 175,951 training depth images from 5 subjects and 124,999 testing depth images from 10 subjects, in which 5 subjects overlap with the training set. |
| Dataset Splits | No | The paper specifies training and testing sets for NYU, ICVL, and Hands19-Task1 datasets but does not explicitly mention or detail a separate validation set split or its size. |
| Hardware Specification | Yes | We train and evaluate our models on a workstation with two Intel Xeon Silver 4210R, 512GB of RAM and an Nvidia RTX3090 GPU. |
| Software Dependencies | No | The paper states: 'Our models are implemented within Py Torch. Adam optimizer is used'. However, it does not provide specific version numbers for PyTorch or other software dependencies. |
| Experiment Setup | Yes | Adam optimizer is used; the initial learning rate is set to 0.001 and is decayed by 0.9 per epoch. In order to conduct data augmentation, we randomly scale the cropped depth map, jitter the centroid of the point cloud, and randomly rotate the camera when rendering multi-view depth. For all smooth L1 loss, the switch point between quadratic and linear is set to 1.0. Our network input is 176 176 hand region cropped from the input depth, and we use Res Net-50 as the backbone of A2J. We first train the 3D pose estimation network and the teacher confidence network together, the loss can be formulated as: Lviewsel = LA2J + γLJ (5) where γ = 0.1 is the factor to balance the loss terms. |