Model-Based Deep Hand Pose Estimation
Authors: Xingyi Zhou, Qingfu Wan, Wei Zhang, Xiangyang Xue, Yichen Wei
IJCAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our approach is verified on challenging public datasets and achieves state-of-the-art performance.Our approach is validated on challenging public datasets. It achieves state-of-the-art accuracy on both joint location and rotation angles.Our approach is validated on challenging public datasets. It achieves state-of-the-art accuracy on both joint location and rotation angles.We use two recent public datasets that are widely used in depth based hand pose estimation.NYU [Tompson et al., 2014] dataset contains 72757 training and 8252 testing images, captured by Prime Sense camera.ICVL [Tang et al., 2014] dataset has over 300k training depth images and 2 testing sequences with each about 800 frames.Metrics Joint location error Angle errorTable 1: Comparison of our approach and different baselines on NYU test dataset. It shows that our approach is best on both average joint and pose (angle) accuracy. |
| Researcher Affiliation | Collaboration | Xingyi Zhou1, Qingfu Wan1, Wei Zhang1, Xiangyang Xue1, Yichen Wei21Shanghai Key Laboratory of Intelligent Information Processing School of Computer Science, Fudan University2Microsoft Research1{zhouxy13, qfwan13, weizh, xyxue}@fudan.edu.cn, 2yichenw@microsoft.com |
| Pseudocode | No | The paper provides mathematical formulations and descriptions of the process, but it does not include a distinct pseudocode block or algorithm section. |
| Open Source Code | Yes | The framework of our approach is briefly illustrated in Figure 1. Our code is public available at https://github.com/tenstep/Deep Model |
| Open Datasets | Yes | We use two recent public datasets that are widely used in depth based hand pose estimation.NYU [Tompson et al., 2014] dataset contains 72757 training and 8252 testing images, captured by Prime Sense camera.ICVL [Tang et al., 2014] dataset has over 300k training depth images and 2 testing sequences with each about 800 frames. |
| Dataset Splits | No | The paper specifies training and testing sizes for the NYU and ICVL datasets but does not explicitly mention a separate validation split or its size. |
| Hardware Specification | Yes | On a PC with an Intel Core i7 4770 3.40GHZ, 32GB of RAM, and an Nvidia Ge Force 960 GPU, one forward pass takes about 8ms, resulting in 125 frames per second in test. |
| Software Dependencies | No | The paper states that the approach is 'implemented in Caffe', but it does not provide a specific version number for Caffe or any other key software dependencies. |
| Experiment Setup | Yes | It starts with 3 convolutional layers with kernel size 5, 5, 3, respectively, followed by max pooling with stride 4, 2, 1 (no padding), respectively. All the convolutional layers have 8 channels. The result convolutional feature maps are 12 12 8. There are then two fully connected (fc) layers, each with 1024 neurons and followed by a dropout layer with dropout ratio 0.3. For all convolutional and fc layers, the activation function is Re LU.In optimization, we use standard stochastic gradient descent, with batch size 512, learning rate 0.003 and momentum 0.9. The training is processed until convergence.weight λ balances the two loss and is fixed to 1 in all our experiments. |