SLTUNET: A Simple Unified Model for Sign Language Translation
Authors: Biao Zhang, Mathias Müller, Rico Sennrich
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show in experiments that SLTUNET achieves competitive and even state-of-the-art performance on PHOENIX-2014T and CSL-Daily when augmented with MT data and equipped with a set of optimization techniques. We further use the DGS Corpus for end-to-end SLT for the first time. It covers broader domains with a significantly larger vocabulary, which is more challenging and which we consider to allow for a more realistic assessment of the current state of SLT than the former two. Still, SLTUNET obtains improved results on the DGS Corpus. |
| Researcher Affiliation | Academia | 1 School of Informatics, University of Edinburgh 2 Department of Computational Linguistics, University of Zurich |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code is available at https://github.com/bzhang Go/sltunet. |
| Open Datasets | Yes | We work on three SLT datasets: PHOENIX-2014T (Camgoz et al., 2018), CSL-Daily (Zhou et al., 2021), and DGS3-T. |
| Dataset Splits | Yes | The split contains 60,306, 967, and 1,575 samples in the train, dev, and test set, respectively (see Table 1 and Appendix A.2 for details). |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory specifications) used for running its experiments. |
| Software Dependencies | No | The paper mentions software like Moses and Sacre BLEU with citations but does not provide specific version numbers for them or other key software dependencies used in the experiments. |
| Experiment Setup | Yes | We experiment with Transformer (Vaswani et al., 2017) and start our analysis with a Baseline system optimized on Sign2Text alone with the following configurations: encoder and decoder layers of N S enc = 2, N P enc = 0 and Ndec = 2 respectively, model dimension of d = 512, feed-forward dimension of dff = 2048, attention head of h = 8, and no CTC regularization. ... We train all SLT models using Adam (β1 = 0.9, β2 = 0.998) (Kingma & Ba, 2015) with Noam learning rate schedule (Vaswani et al., 2017), a label smoothing of 0.1 and warmup step of 4K. We employ Xavier initialization to initialize model parameters with a gain of 1.0. |