Scaling Sign Language Translation

Authors: Biao Zhang, Garrett Tanzer, Orhan Firat

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We perform large-scale SLT pretraining on different data... We finetune the pretrained SLT models on 5 downstream open-domain SLT benchmarks... Experiments show substantial quality improvements over the vanilla baselines, surpassing the previous state-of-the-art (SOTA) by wide margins.
Researcher Affiliation Industry Biao Zhang Garrett Tanzer Orhan Firat Google Deep Mind {biaojiaxing,gtanzer,orhanf}@google.com
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code No The used (m/By)T5 model checkpoints are publicly available, but the framework we used to finetune them with multimodal inputs has not been open sourced, so we are unable to release our code.
Open Datasets Yes We use the parallel sentence-level portion of MADLAD-400 [23] as the MT pretraining data. ... MADLAD-400 is publicly available, but our noisy You Tube dataset is not.
Dataset Splits Yes Table 1: Summary of downstream SLT benchmarks. #Train/#Dev/#Test : the number of examples in the train, dev and test split.
Hardware Specification Yes We pretrain models up to 1M steps using 64/64/128 TPU-v3 chips for Base/Large/XL, taking 720 days.
Software Dependencies No The paper mentions software components like T5, Adafactor, Media Pipe Holistic landmarks, BLEU, Chr F, and BLEURT, but does not provide specific version numbers for any of them.
Experiment Setup Yes For Pretraining, we use a batch size of 256 and a constant learning rate of 0.001. We optimize models with Adafactor [39], and set the maximum text input, landmark input, and text output length to 512. For Finetuning, we use a batch size of 32 and a constant learning rate of 0.0005.