Exploiting Learnable Joint Groups for Hand Pose Estimation
Authors: Moran Li, Yuan Gao, Nong Sang1921-1929
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The detailed ablation analysis and the extensive experiments on several benchmark datasets demonstrate the promising performance of the proposed method over the state-of-the-art (SOTA) methods. Besides, our method achieves top-1 among all the methods that do not exploit the dense 3D shape labels on the most recently released Frei HAND competition at the submission date. |
| Researcher Affiliation | Collaboration | Moran Li1*, Yuan Gao1,2*, Nong Sang1 1 Key Laboratory of Image Processing and Intelligent Control, School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, Wuhan, China 2 Tencent AI Lab |
| Pseudocode | No | The paper describes the method using text and mathematical equations, but it does not include any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | The source code and models are available at https://github.com/ moranli-aca/Learnable Groups-Hand. |
| Open Datasets | Yes | RHD (Rendered Hand Pose Dataset) (Zimmermann and Brox 2017) is a synthesized rendering hand dataset containing 41,285 training and 2,728 testing samples. STB (Stereo Hand Pose Benchmark) (Zhang et al. 2017) is a real hand dataset containing 18,000 stereo pairs samples. Dexter + Object (Dexter and Object) (Sridhar et al. 2016) is a real hand object interaction dataset consisting of six sequences with two actors (one female). Frei HAND (Zimmermann et al. 2019) is the latest released real hand dataset containing 130,240 training and 3,960 testing samples. |
| Dataset Splits | Yes | RHD (Rendered Hand Pose Dataset) (Zimmermann and Brox 2017) is a synthesized rendering hand dataset containing 41,285 training and 2,728 testing samples. We split this dataset into a training set with 15,000 images and an evaluation set with 3,000 images following (Zimmermann and Brox 2017). Frei HAND (Zimmermann et al. 2019) is the latest released real hand dataset containing 130,240 training and 3,960 testing samples. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used to run the experiments, such as GPU models, CPU types, or memory specifications. |
| Software Dependencies | No | The paper mentions using 'Adam optimizer' but does not specify any software versions for frameworks (e.g., TensorFlow, PyTorch), programming languages, or other libraries. For example, it does not state 'Python 3.x' or 'PyTorch 1.x'. |
| Experiment Setup | Yes | We implement the data pre-processing and augmentation similar to Yang et al. (Yang et al. 2019). Specifically, we first crop original RGB images using the bounding box calculated by the ground truth masks and resize the cropped image to 256 256. Then, we apply an online data augmentation with a random scaling between [1, 1.2], a random rotation between [ π, π], a random translation between [ 20, 20], and a color jittering with a random hue between [ 0.1, 0.1]. We use Adam optimizer (Kingma and Ba 2014) to train the network. We train the shared feature extraction module (hourglass network) to predict 2D joints location using ℓ1 loss for initialization, with a learning rate of 1e-3 and a mini-batch size of 64. Then, we use the loss function defined in Eq. (10) with β = 20 to optimize the overall network. The learning rates for the newly introduced grouped feature learning module and feature fusing module are 1e-1 and 1e-2, respectively. For the remaining network parameters, the learning rate is set to 1e-4 with a mini-batch size of 32. For training, we initialize every θi,j of Eq. (8) to be 1/K as we do not impose priors for the group categorization (K is the number of groups). τ of Eq. (8) is initialized to be 5 and decrease 0.1 for every 1,000 steps until it reaches around 0. We use the number of groups as 3 (i.e., K = 3) in all of our experiments as we find that further increasing the number of groups produces comparable results (as shown in our ablation analysis in the main paper), which coincides with the conclusion from (Tang and Wu 2019). |