Two-Stream Network for Sign Language Recognition and Translation

Authors: Yutong Chen, Ronglai Zuo, Fangyun Wei, Yu Wu, Shujie LIU, Brian Mak

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimentally, our Two Stream-SLR and Two Stream-SLT achieve stateof-the-art performance on SLR and SLT tasks across a series of datasets including Phoenix-2014, Phoenix-2014T, and CSL-Daily.
Researcher Affiliation Collaboration 1Microsoft Research Asia 2The Hong Kong University of Science and Technology
Pseudocode No The paper describes the architecture and components of the proposed network using figures and textual descriptions but does not provide pseudocode or algorithm blocks.
Open Source Code Yes Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes]
Open Datasets Yes We use Phoenix-2014 [27], Phoenix-2014T [6], and CSL-Daily [55] to evaluate our method on SLR, while the last two datasets are also leveraged for SLT evaluation since they provide text annotations.
Dataset Splits Yes Phoenix-2014 is a German SLR dataset with a vocabulary size of 1081 for glosses. It consists of 5672, 540, and 629 samples in the training, dev, and test set. Phoenix-2014T ... There are 7096, 519, and 642 samples in the training, dev, and test set. CSL-Daily ... It consists of 18401, 1077, and 1176 samples in the training, dev, and test set.
Hardware Specification Yes We train our models on 8 Nvidia V100 GPUs.
Software Dependencies No The paper mentions software components like 'm BART' and 'HRNet' along with their training data, but it does not specify version numbers for general software dependencies like Python, PyTorch, CUDA, or specific library versions.
Experiment Setup Yes Unless otherwise specified, we set λV = 0.2 and λK = 0.5 in Eq. 1, beam width as 5 for the CTC decoder and the SLT decoder during inference. We use cosine annealing schedule of 40 epochs and an Adam optimizer with weight decay 1e 3, initial learning rate 1e 3 for Two Stream-SLR and 1e 5 for the MLP and translation network in Two Stream-SLT.