reproducibilityindex.ai

Sign2GPT: Leveraging Large Language Models for Gloss-Free Sign Language Translation

Authors: Ryan Wong, Necati Cihan Camgoz, Richard Bowden

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our approach on two public benchmark sign language translation datasets, namely RWTH-PHOENIX-Weather 2014T and CSL-Daily, and improve on state-of-the-art gloss-free translation performance with a significant margin. ... 4.3 COMPARISONS WITH STATE-OF-THE-ART METHODS ... 4.5 ABLATION STUDY
Researcher Affiliation	Collaboration	Ryan Wong1, Necati Cihan Camgoz2, Richard Bowden1 1University of Surrey, 2Meta Reality Labs
Pseudocode	No	The paper describes its methods in text and with diagrams (Figure 1, Figure 2, Figure 3) but does not include any explicit pseudocode blocks or algorithms.
Open Source Code	No	The paper mentions providing 'details of the training settings' and 'details of the libraries we used for the pretrained models in Appendix A.1' for reproducibility, but it does not state that the source code for the methodology described in the paper is openly available or provide a link to it.
Open Datasets	Yes	We evaluate our approach on two public benchmark sign language translation datasets, namely RWTH-PHOENIX-Weather 2014T and CSL-Daily. ... RWTH-PHOENIX-WEATHER-2014T (Phoenix14T) (Camgoz et al., 2018) is a German Sign Language dataset... CSL-Daily (Zhou et al., 2021) is a translation dataset...
Dataset Splits	No	The paper states 'We conduct our ablation studies on the Phoenix14T dataset, evaluating the BLEU-4 score on the development set.' implying the use of a validation/development set, but it does not specify the exact split percentages or sample counts for training, validation, and test sets, nor does it cite a predefined split.
Hardware Specification	Yes	The model is trained end-to-end with a batch size of 8 on two A100 GPUs
Software Dependencies	No	Appendix A.1 lists some software components used, such as Dino-V2, XGLM (pretrained weights), SpaCy, and FastText embeddings. However, it does not provide specific version numbers for these libraries, except for a general mention of 'Flash attention v2'.
Experiment Setup	Yes	The model is trained end-to-end with a batch size of 8 on two A100 GPUs, subsampling every second frame. ... The sign encoder is a 4 layer transformer with hidden dimension of 512, 8 attention heads and intermediate size of 2048. The temporal downsampling is applied after the 2nd layer. ... We employ the Adam optimizer ... with a learning rate of 3 10 4 and weight decay of 0.001. Training spans 100 epochs with gradient clipping of 1.0 and includes a one-cycle cosine learning rate scheduler ... with warmup for the initial 5 epochs. ... We initialize the prototype (τU) and time temperature (τT ) to 0.1. ... We utilize cross-entropy loss with label smoothing set to 0.1 during training. The Lo RA rank and alpha values are both set to 4. During inference, we employ a beam search with a width of 4.