Non-autoregressive Translation with Layer-Wise Prediction and Deep Supervision
Authors: Chenyang Huang, Hao Zhou, Osmar R. Zaïane, Lili Mou, Lei Li10776-10784
AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conducted extensive experiments on four translation tasks (both directions of WMT 14 EN DE and WMT 16 EN RO). Results show that our approach consistently improves the BLEU scores compared with respective base models. |
| Researcher Affiliation | Collaboration | Chenyang Huang*1,2, Hao Zhou 2, Osmar R. Za ıane1, Lili Mou 1, Lei Li 3 1Department of Computing Science, Alberta Machine Intelligence Institute (Amii), University of Alberta 2Byte Dance AI Lab 3University of California, Santa Barbara {chenyangh,zaiane}@ualberta.ca zhouhao.nlp@bytedance.com doublepower.mou@gmail.com lilei@cs.ucsb.edu |
| Pseudocode | No | The paper describes its model and methods using textual descriptions and mathematical equations but does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | 1Our code, training/evaluation scripts, and output are available at https://github.com/MANGA-UOFA/DSLP.git. |
| Open Datasets | Yes | Datasets. We evaluated our models on benchmark translation datasets: WMT 14 English German (4.0M sentence pairs) and WMT 16 English Romanian (610K pairs). For fair comparison, we obtained the preprocessed corpus (tokenization and vocabulary) released by previous work: Zhou, Gu, and Neubig (2020) for WMT 14 EN DE, and Lee, Mansimov, and Cho (2018) for WMT 16 EN RO. |
| Dataset Splits | No | The paper states the use of WMT 14 EN DE and WMT 16 EN RO benchmark datasets, and that preprocessed corpora from previous work were obtained for fair comparison. However, it does not explicitly state the train/validation/test split percentages, sample counts, or specific split files within its text. |
| Hardware Specification | Yes | To measure inference latency, we used a single Nvidia V100 GPU and performed inference with one sentence at a time. |
| Software Dependencies | No | Our models were implemented and evaluated with the open-source toolkit Fair Seq (Ott et al. 2019). No specific version numbers for software components are provided. |
| Experiment Setup | Yes | Hyperparameters. We mostly followed the standard hyperparameters used in NAT research. We used a batch size of 128K tokens for EN DE and 32K tokens for EN RO, with a maximum 300K updates. For regularization, we set the dropout rate to 0.1 for EN DE and 0.3 for EN RO. For the mixed training, we used a fixed mixing ratio λ and set it to 0.3. |