Video-Based Sign Language Recognition Without Temporal Segmentation
Authors: Jie Huang, Wengang Zhou, Qilin Zhang, Houqiang Li, Weiping Li
AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments are carried out on two large scale datasets. Experimental results demonstrate the effectiveness of the proposed framework. |
| Researcher Affiliation | Collaboration | Jie Huang,1 Wengang Zhou,2 Qilin Zhang,3 Houqiang Li,4 Weiping Li5 1,2,4,5Department of Electronic Engineering and Information Science, University of Science and Technology of China 3HERE Technologies, Chicago, Illinois, USA |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper states, 'The CSL dataset in Tab. 1 is collected by us and released on our project web page4.', providing a link to the dataset. However, it does not provide concrete access to the source code for the methodology described in the paper. |
| Open Datasets | Yes | Two open source continuous SLR datasets are used in the following experiments, one for CSL and the other is the German sign language dataset RWTH-PHOENIX-Weather (Koller, Forster, and Ney 2015). The CSL dataset in Tab. 1 is collected by us and released on our project web page4. 4http://mccipc.ustc.edu.cn/mediawiki/index.php/SLR_Dataset |
| Dataset Splits | Yes | The CSL dataset contains 25K labeled video instances... 17K instances are selected for training, 2K for validation, and the rest 6K for testing. The RWTH-PHOENIX-Weather dataset contains 7K weather forecasts sentences from 9 signers... Following (Koller, Forster, and Ney 2015), 5,672 instances are used for training, 540 for validation, and 629 for testing. |
| Hardware Specification | No | The paper does not provide specific hardware details (exact GPU/CPU models, processor types, or memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper mentions software components and models like CNN, LSTM, Faster R-CNN, and C3D, but does not provide specific version numbers for any ancillary software dependencies. |
| Experiment Setup | Yes | Per (Tran et al. 2015), videos are divided into 16-frame clips with 50% overlap, with frames cropped and resized at 227 227. The outputs of the 4096-dimensinoal fc6 layer from 2-stream 3D CNN are clip representations. The following parameters are set based on our validation set. The dimension of latent space and the size of hidden layer in HAN are both 1024. The trade-off parameter λ1 in Eq. (9) of relevance loss and coherence loss is set to 0.6. The regularization parameter is empirically set to 0.0001. |