Model-Based Deep Hand Pose Estimation

Authors: Xingyi Zhou, Qingfu Wan, Wei Zhang, Xiangyang Xue, Yichen Wei

IJCAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our approach is verified on challenging public datasets and achieves state-of-the-art performance.Our approach is validated on challenging public datasets. It achieves state-of-the-art accuracy on both joint location and rotation angles.Our approach is validated on challenging public datasets. It achieves state-of-the-art accuracy on both joint location and rotation angles.We use two recent public datasets that are widely used in depth based hand pose estimation.NYU [Tompson et al., 2014] dataset contains 72757 training and 8252 testing images, captured by Prime Sense camera.ICVL [Tang et al., 2014] dataset has over 300k training depth images and 2 testing sequences with each about 800 frames.Metrics Joint location error Angle errorTable 1: Comparison of our approach and different baselines on NYU test dataset. It shows that our approach is best on both average joint and pose (angle) accuracy.
Researcher Affiliation Collaboration Xingyi Zhou1, Qingfu Wan1, Wei Zhang1, Xiangyang Xue1, Yichen Wei21Shanghai Key Laboratory of Intelligent Information Processing School of Computer Science, Fudan University2Microsoft Research1{zhouxy13, qfwan13, weizh, xyxue}@fudan.edu.cn, 2yichenw@microsoft.com
Pseudocode No The paper provides mathematical formulations and descriptions of the process, but it does not include a distinct pseudocode block or algorithm section.
Open Source Code Yes The framework of our approach is briefly illustrated in Figure 1. Our code is public available at https://github.com/tenstep/Deep Model
Open Datasets Yes We use two recent public datasets that are widely used in depth based hand pose estimation.NYU [Tompson et al., 2014] dataset contains 72757 training and 8252 testing images, captured by Prime Sense camera.ICVL [Tang et al., 2014] dataset has over 300k training depth images and 2 testing sequences with each about 800 frames.
Dataset Splits No The paper specifies training and testing sizes for the NYU and ICVL datasets but does not explicitly mention a separate validation split or its size.
Hardware Specification Yes On a PC with an Intel Core i7 4770 3.40GHZ, 32GB of RAM, and an Nvidia Ge Force 960 GPU, one forward pass takes about 8ms, resulting in 125 frames per second in test.
Software Dependencies No The paper states that the approach is 'implemented in Caffe', but it does not provide a specific version number for Caffe or any other key software dependencies.
Experiment Setup Yes It starts with 3 convolutional layers with kernel size 5, 5, 3, respectively, followed by max pooling with stride 4, 2, 1 (no padding), respectively. All the convolutional layers have 8 channels. The result convolutional feature maps are 12 12 8. There are then two fully connected (fc) layers, each with 1024 neurons and followed by a dropout layer with dropout ratio 0.3. For all convolutional and fc layers, the activation function is Re LU.In optimization, we use standard stochastic gradient descent, with batch size 512, learning rate 0.003 and momentum 0.9. The training is processed until convergence.weight λ balances the two loss and is fixed to 1 in all our experiments.