Contrastive Learning for Sign Language Recognition and Translation

Authors: Shiwei Gan, Yafeng Yin, Zhiwei Jiang, Kang Xia, Lei Xie, Sanglu Lu

IJCAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experimental results on current sign language datasets demonstrate the effectiveness of our approach, which achieves state-of-the-art performance.
Researcher Affiliation Academia Shiwei Gan , Yafeng Yin , Zhiwei Jiang , Kang Xia , Lei Xie and Sanglu Lu State Key Laboratory for Novel Software Technology, Nanjing University, China sw@smail.nju.edu.cn, {yafeng, jzw}@nju.edu.cn, xiakang@smail.nju.edu.cn, {lxie, sanglu}@nju.edu.cn
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide a concrete access link or explicit statement about the release of source code.
Open Datasets Yes We test our model on public sign language datasets that currently are often used. (1) Phoenix14T [Camgoz et al., 2018] contains 7096, 519 and 642 samples from 9 signers for training, validation, and testing respectively. (2) The CSL-daily dataset [Zhou et al., 2021a] contains 18401, 1077 and 1176 labeled videos from 10 signers for training, validation and testing respectively. (3) Phoenix14 [Koller et al., 2015] contains 5672, 540, 629 samples from 9 signers for training, validation, and testing respectively and it has a vocabulary of 1295 glosses for CSLR only.
Dataset Splits Yes We test our model on public sign language datasets that currently are often used. (1) Phoenix14T [Camgoz et al., 2018] contains 7096, 519 and 642 samples from 9 signers for training, validation, and testing respectively. (2) The CSL-daily dataset [Zhou et al., 2021a] contains 18401, 1077 and 1176 labeled videos from 10 signers for training, validation and testing respectively. (3) Phoenix14 [Koller et al., 2015] contains 5672, 540, 629 samples from 9 signers for training, validation, and testing respectively and it has a vocabulary of 1295 glosses for CSLR only.
Hardware Specification Yes We adopt Adam optimizer with a weight decay of 0.0001 to train our model for 70 epochs on 2 Ge Force RTX 3090 GPUs.
Software Dependencies Yes Our architecture adopts the components provided by Py Torch 1.11.
Experiment Setup Yes Training setting. To train our model, we use the following settings for CSLR and SLT. We adopt Adam optimizer with a weight decay of 0.0001 to train our model for 70 epochs on 2 Ge Force RTX 3090 GPUs. The initial learning rate is 0.0001 with a decay factor of 0.5 and the batch size is set to 6.