reproducibilityindex.ai

Hand-Model-Aware Sign Language Recognition

Authors: Hezhen Hu, Wengang Zhou, Houqiang Li1558-1566

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To validate the effectiveness of our method, we perform extensive experiments on four benchmark datasets, including NMFs-CSL, SLR500, MSASL and WLASL. Experimental results demonstrate that our method achieves stateof-the-art performance on all four popular benchmarks with a notable margin.
Researcher Affiliation	Academia	Hezhen Hu,1 Wengang Zhou, 1, 2 Houqiang Li 1, 2 1 CAS Key Laboratory of GIPAS, EEIS Department, University of Science and Technology of China 2 Institute of Artiﬁcial Intelligence, Hefei Comprehensive National Science Center alexhu@mail.ustc.edu.cn, {zhwg, lihq}@ustc.edu.cn
Pseudocode	No	The paper does not contain any pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide any concrete access to source code or a statement about its release.
Open Datasets	Yes	We evaluate our proposed method on four publicly available datasets, including NMFs-CSL (Hu et al. 2020), SLR500 (Huang et al. 2019), MSASL (Joze and Koller 2019) and WLASL (Li et al. 2020b).
Dataset Splits	Yes	MSASL is an American sign language dataset (ASL) with a vocabulary size of 1,000. It is collected from Web videos. It contains 25,513 samples in total with 16,054, 5,287 and 4,172 for training, validation and testing, respectively.
Hardware Specification	Yes	In our experiment, all the models are implemented in Py Torch (Paszke et al. 2019) platform and trained on NVIDIA RTX-TITAN.
Software Dependencies	No	In our experiment, all the models are implemented in Py Torch (Paszke et al. 2019) platform and trained on NVIDIA RTX-TITAN. ... We use Open Pose (Cao et al. 2019; Simon et al. 2017) to extract the full keypoints...
Experiment Setup	Yes	Temporally, we extract 32 frames using random and center sampling during training and testing, respectively. During training, the input frames are randomly cropped to 256 256 at the same spatial position. Then the frames are randomly horizontally ﬂipped with a probability of 0.5. During testing, the input video is center cropped to 256 256 and fed into the model. The model is trained with Stochastic Gradient Descent (SGD) optimizer. The weight decay and momentum are set to 1e-4 and 0.9, respectively. We set the initial learning rate as 5e-3 and reduce it by a factor of 0.1 when the validation loss is saturated. In all experiments, the hyper parameters ϵ, wβ, λspa, λtem, λreg, α0, α1 and α2 is set to 0.4, 10, 0.1, 0.1, 0.1, 1, 2.5 and 4, respectively.