Regional Attention with Architecture-Rebuilt 3D Network for RGB-D Gesture Recognition

Authors: Benjia Zhou, Yunan Li, Jun Wan3563-3571

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on two recent large-scale RGB-D gesture datasets validate the effectiveness of the proposed method and show it outperforms state-of-the-art methods. and Experiments section with subsections like Datasets, Experimental Setup, Comparison with State-of-the-art Methods, Ablation Studies.
Researcher Affiliation Academia 1Macau University of Science and Technology, Macau SAR, China 2 National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing, China 3 School of Computer Science and Technology, Xidian Univeristy, China 4 Xi an Key Laboratory of Big Data and Intelligent Vision, China 5 School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China
Pseudocode No The paper describes methods verbally and with equations, but does not include any structured pseudocode or algorithm blocks.
Open Source Code Yes The codes of our method are available at: https://github.com/zhoubenjia/RAAR3DNet.
Open Datasets Yes We evaluate our method and compare it with other state-of-the-art methods on two RGB-D gesture datasets: Chalearn Iso GD dataset (Wan et al. 2016) and Nv Gesture dataset (Molchanov et al. 2016). Meanwhile, we also conduct the ablation studies on a hand-centred action dataset, THU-READ dataset (Tang et al. 2017, 2018) to show the generality of our network.
Dataset Splits Yes if the accuracy on the validation set not improved every 3 epochs, it is reduced by 10 times. and Since most of methods release their result on the validation subset, we also conduct experiments on it for a fair comparison.
Hardware Specification Yes Our experiments are all conducted with Pytorch on the NVIDIA RTX 2080 Ti GPU.
Software Dependencies No The paper mentions 'Pytorch' but does not specify its version or any other software dependencies with version numbers.
Experiment Setup Yes During the training process, the inputs are spatially resized to 256 256 and then cropped into 224 224 randomly in the training stage, and are center cropped into 224 224 in the test stage. The data is fed into the network with a mini-batch of 64 samples. For optimization, We use the SGD optimizer to train our network with the weight decay of 0.0003 and the momentum of 0.9. The learning rate is initially fixed as 0.01 and if the accuracy on the validation set not improved every 3 epochs, it is reduced by 10 times. The training work is stopped after 80 epochs or when the learning rate is under 1e-5. and γ is the balancing parameter and we have γ = 100 in this paper.