Improving Non-Autoregressive Translation Models Without Distillation

Authors: Xiao Shi Huang, Felipe Perez, Maksims Volkovs

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our approach on multiple public NMT datasets: IWSLT 14 De-En/En-De, WMT 14 De En/En-De, and WMT 16 Ro-En/En-Ro. We use the same training/validation/test sets as in previous work (Ghazvininejad et al., 2019) and report test set performance in BLEU for direct comparison. For each dataset we compute performance on both raw and distilled settings, resulting in 12 dataset in total.
Researcher Affiliation Industry Xiao Shi Huang, Felipe Pérez, Maksims Volkovs Layer 6 AI {gary,felipe,maks}@layer6.ai
Pseudocode Yes Algorithm 1: CMLMC Training
Open Source Code Yes Code for this work is available here: https://github.com/layer6ai-labs/CMLMC.
Open Datasets Yes We evaluate our approach on multiple public NMT datasets: IWSLT 14 De-En/En-De, WMT 14 De En/En-De, and WMT 16 Ro-En/En-Ro. We use the same training/validation/test sets as in previous work (Ghazvininejad et al., 2019)
Dataset Splits Yes We use the same training/validation/test sets as in previous work (Ghazvininejad et al., 2019)
Hardware Specification Yes and we train the models on the IBM servers with 160 POWER9 CPUs, 600GB RAM and 4 Tesla V100 GPUs (32G).
Software Dependencies No The paper mentions using the Fairseq library and Adam optimizer but does not provide specific version numbers for these software components.
Experiment Setup Yes Hyper-parameters for each dataset are selected through grid search and are listed in Table B.1 in Appendix.