reproducibilityindex.ai

Attention in Convolutional LSTM for Gesture Recognition

Authors: Liang Zhang, Guangming Zhu, Lin Mei, Peiyi Shen, Syed Afaq Ali Shah, Mohammed Bennamoun

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The evaluation results demonstrate that the spatial convolutions in the three gates scarcely contribute to the spatiotemporal feature fusion, and the attention mechanisms embedded into the input and output gates cannot improve the feature fusion.
Researcher Affiliation	Academia	Liang Zhang Xidian University liangzhang@xidian.edu.cn Guangming Zhu Xidian University gmzhu@xidian.edu.cn Lin Mei Xidian University l_mei72@hotmail.com Peiyi Shen Xidian University pyshen@xidian.edu.cn Syed Afaq Ali Shah Central Queensland University afaq.shah@uwa.edu.au Mohammed Bennamoun University of Western Australia mohammed.bennamoun@uwa.edu.au
Pseudocode	No	The paper describes formulations using mathematical equations and figures, but does not include structured pseudocode or algorithm blocks.
Open Source Code	Yes	The code of the LSTM variants is publicly available2. 2https://github.com/GuangmingZhu/AttentionConvLSTM
Open Datasets	Yes	The proposed variants of Conv LSTM are evaluated on the large-scale isolated gesture datasets Jester [18] and Iso GD [19] in this paper. Jester[18] is a large collection of densely-labeled video clips. ... https://www.twentybn.com/datasets/jester, 2017. Iso GD[19] is a large-scale isolated gesture dataset which contains 47,933 RGB+D gesture videos of 249 kinds of gestures performed by 21 subjects. The dataset has been used in the 2016 [24] and 2017 [25] Cha Learn LAP Large-scale Isolated Gesture Recognition Challenges.
Dataset Splits	No	The evaluation on Jester has almost the same accuracy except for variant (b). The similar recognition results on Jester may be caused by the network capacity or the distinguishability of the data, because the validation has a comparable accuracy with the training. While validation is mentioned, the paper does not specify the explicit training/validation/test split percentages or sample counts for dataset partitioning.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts) used for running its experiments.
Software Dependencies	No	The paper mentions components like Res3D and Mobile Net, but does not provide specific software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x).
Experiment Setup	Yes	For the training on Jester, the learning rate follows a polynomial decay from 0.001 to 0.000001 within a total of 30 epochs. The input is 16 video clips, and each clip contains 16 frames with a spatial size of 112 112. ... During the ﬁne-tuning with Iso GD, the batch size is set to 8, the temporal length is set to 32, and a total of 15 epochs are performed for each variant. The top-1 accuracy is used as the metric of evaluation. Stochastic gradient descent (SGD) is used for training. ... The ﬁlter numbers of Conv LSTM and the variants are all set to 256.