Levenshtein Transformer
Authors: Jiatao Gu, Changhan Wang, Junbo Zhao
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We validate the efficiency, effectiveness, and flexibility of Levenshtein Transformer extensively across three different tasks machine translation (MT), text summarization (TS) and automatic post-editing (APE) for machine translation, from both generation ( 4.1) and refinement ( 4.2) perspectives. ... Table 1: Generation quality (BLEU , ROUGE-1/2/L ) and latency (ms ) as well as the average number of decoder iterations (IDEC) on the standard test sets for Lev T and the autoregressive baseline (with both greedy and beam-search outputs). |
| Researcher Affiliation | Collaboration | Facebook AI Research New York University Tigerobo Inc. {jgu, changhan}@fb.com jakezhao@cs.nyu.edu |
| Pseudocode | No | The paper describes the model's operations and inference steps in prose but does not include structured pseudocode or an algorithm block. |
| Open Source Code | Yes | Codes for reproducing this paper are released in https://github.com/pytorch/fairseq/tree/ master/examples/nonautoregressive_translation |
| Open Datasets | Yes | We use three diversified language pairs for MT experiments: WMT 16 Romanian-English (Ro-En)3, WMT 14 English-German (En-De)4 and WAT2017 Small-NMT English-Japanese (En-Ja, Nakazawa et al., 2017)5. The TS experiments use preprocessed data from the Annotated English Gigaword (Gigaword, Rush et al., 2015)6. |
| Dataset Splits | No | The paper mentions using 'standard test sets' and 'training on 8 Nvidia Volta GPUs with maximum 300K steps and a total batch-size of around 65, 536 tokens per step'. It also states 'Detailed dataset statistics can be found in the Appendix'. However, specific numerical details or predefined split ratios for training, validation, and test sets are not explicitly provided in the main text. |
| Hardware Specification | Yes | All the Transformer-based models are trained on 8 Nvidia Volta GPUs... We measure the speed by the averaged generation latency of generating one sequence at a time on single Nvidia V100 GPU. |
| Software Dependencies | No | The paper mentions the use of 'pytorch/fairseq' for code release and 'Transformer (Vaswani et al., 2017)' as a building block, but it does not specify version numbers for any software dependencies. |
| Experiment Setup | Yes | All the Transformer-based models are trained on 8 Nvidia Volta GPUs with maximum 300K steps and a total batch-size of around 65, 536 tokens per step |