Efficient Sign Language Translation with a Curriculum-based Non-autoregressive Decoder

Authors: Pei Yu, Liang Zhang, Biao Fu, Yidong Chen

IJCAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results on PHOENIX2014T and CSL-Daily demonstrate that our model consistently outperforms all competitive baselines and achieves 7.92/8.02 speedup compared to the AR SLT model respectively. Our source code is available at https://github.com/ yp20000921/CND.
Researcher Affiliation Academia Pei Yu1,2, Liang Zhang1,2, Biao Fu1,2, Yidong Chen1,2 1School of Informatics, Xiamen University, China 2Key Laboratory of Digital Protection and Intelligent Processing of Intangible Cultural Heritage of Fujian and Taiwan (Xiamen University), Ministry of Culture and Tourism, China yupei@stu.xmu.edu.cn, ydchen@xmu.edu.cn
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks. Methods are described in prose and diagrams.
Open Source Code Yes Our source code is available at https://github.com/ yp20000921/CND.
Open Datasets Yes We evaluate our approach on two popular benchmark datasets for sign language translation task, i.e., PHOENIX 2014T [Camgoz et al., 2018] and CSL-Daily [Zhou et al., 2021a].
Dataset Splits Yes PHOENIX 2014T ... includes 7,096/519/642 continuous sign language videos in train/dev/test splits. CSL-Daily ... includes 18,401/1,077/1,176 continuous sign language videos in train/dev/test splits.
Hardware Specification Yes Our model is developed based on Py Torch and all the experiments are run on 1 Titan RTX GPU.
Software Dependencies No The paper mentions "Py Torch" but does not provide a specific version number for PyTorch or any other software dependencies with their versions, which is required for reproducibility.
Experiment Setup Yes Then we train all the models for 60 epochs using Adam (β1 = 0.9, β2 = 0.998) [Kingma and Ba, 2014] with a Linear Warm-up Scheduler [Goyal et al., 2017], where the peak learning rate is set to 5e-4 and warm-up step is 8K. The hyper-parameters α and β are set to 0.5 and 1, respectively. The encoder/decoder layer number N is set to 5. For NAR inference, we follow [Ghazvininejad et al., 2019] and set the length beam to 5 and for AR inference the beam size is set to 5.