Scaling Sign Language Translation
Authors: Biao Zhang, Garrett Tanzer, Orhan Firat
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We perform large-scale SLT pretraining on different data... We finetune the pretrained SLT models on 5 downstream open-domain SLT benchmarks... Experiments show substantial quality improvements over the vanilla baselines, surpassing the previous state-of-the-art (SOTA) by wide margins. |
| Researcher Affiliation | Industry | Biao Zhang Garrett Tanzer Orhan Firat Google Deep Mind {biaojiaxing,gtanzer,orhanf}@google.com |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | No | The used (m/By)T5 model checkpoints are publicly available, but the framework we used to finetune them with multimodal inputs has not been open sourced, so we are unable to release our code. |
| Open Datasets | Yes | We use the parallel sentence-level portion of MADLAD-400 [23] as the MT pretraining data. ... MADLAD-400 is publicly available, but our noisy You Tube dataset is not. |
| Dataset Splits | Yes | Table 1: Summary of downstream SLT benchmarks. #Train/#Dev/#Test : the number of examples in the train, dev and test split. |
| Hardware Specification | Yes | We pretrain models up to 1M steps using 64/64/128 TPU-v3 chips for Base/Large/XL, taking 720 days. |
| Software Dependencies | No | The paper mentions software components like T5, Adafactor, Media Pipe Holistic landmarks, BLEU, Chr F, and BLEURT, but does not provide specific version numbers for any of them. |
| Experiment Setup | Yes | For Pretraining, we use a batch size of 256 and a constant learning rate of 0.001. We optimize models with Adafactor [39], and set the maximum text input, landmark input, and text output length to 512. For Finetuning, we use a batch size of 32 and a constant learning rate of 0.0005. |