Improving Non-Autoregressive Translation Models Without Distillation
Authors: Xiao Shi Huang, Felipe Perez, Maksims Volkovs
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our approach on multiple public NMT datasets: IWSLT 14 De-En/En-De, WMT 14 De En/En-De, and WMT 16 Ro-En/En-Ro. We use the same training/validation/test sets as in previous work (Ghazvininejad et al., 2019) and report test set performance in BLEU for direct comparison. For each dataset we compute performance on both raw and distilled settings, resulting in 12 dataset in total. |
| Researcher Affiliation | Industry | Xiao Shi Huang, Felipe Pérez, Maksims Volkovs Layer 6 AI {gary,felipe,maks}@layer6.ai |
| Pseudocode | Yes | Algorithm 1: CMLMC Training |
| Open Source Code | Yes | Code for this work is available here: https://github.com/layer6ai-labs/CMLMC. |
| Open Datasets | Yes | We evaluate our approach on multiple public NMT datasets: IWSLT 14 De-En/En-De, WMT 14 De En/En-De, and WMT 16 Ro-En/En-Ro. We use the same training/validation/test sets as in previous work (Ghazvininejad et al., 2019) |
| Dataset Splits | Yes | We use the same training/validation/test sets as in previous work (Ghazvininejad et al., 2019) |
| Hardware Specification | Yes | and we train the models on the IBM servers with 160 POWER9 CPUs, 600GB RAM and 4 Tesla V100 GPUs (32G). |
| Software Dependencies | No | The paper mentions using the Fairseq library and Adam optimizer but does not provide specific version numbers for these software components. |
| Experiment Setup | Yes | Hyper-parameters for each dataset are selected through grid search and are listed in Table B.1 in Appendix. |