Neural Machine Translation with Error Correction
Authors: Kaitao Song, Xu Tan, Jianfeng Lu
IJCAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on three IWSLT translation datasets and two WMT translation datasets demonstrate that our method achieves improvements over Transformer baseline and scheduled sampling. Further experimental analyses also verify the effectiveness of our proposed error correction mechanism to improve the translation quality. |
| Researcher Affiliation | Collaboration | 1Nanjing University of Science and Technology 2Microsoft Research Asia {kt.song, lujf}@njust.edu.cn, xuta@microsoft.com |
| Pseudocode | No | The paper describes the proposed method textually and with diagrams (Figure 1, Figure 2), but it does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our code is implemented on fairseq [Ott et al., 2019] 3, and we will release our code under this link: https://github.com/Still Keep Try/ECM-NMT. |
| Open Datasets | Yes | IWSLT datasets. For IWSLT14 De En, it contains 160K and 7K sentence pairs in training set and valid set. ... IWSLT datasets can be download from https://wit3.fbk.eu/ archive/2014-01/texts. WMT datasets. WMT14 En De and WMT16 En Ro translation tasks contain 4.5M and 2.8M bilingual data for training. |
| Dataset Splits | Yes | For IWSLT14 De En, it contains 160K and 7K sentence pairs in training set and valid set. ... For IWLST14 Es En and He En 1, they contain 180K and 150K bilingual data for training. We choose TED.tst2013 as the valid set and TED.tst2014 as the test set. ... Following previous work [Vaswani et al., 2017], we concatenate newstest2012 and newstest2013 as the valid set, and choose newstest2014 as the test set for WMT14 En De. For WMT16 En Ro, we choose newsdev2016 as the valid set and newstest2016 as the test set. |
| Hardware Specification | Yes | The IWSLT tasks are trained on single NVIDIA P40 GPU for 100K steps and the WMT tasks are trained with 8 NVIDIA P40 GPUs for 300K steps, where each GPU is filled with 4096 tokens. |
| Software Dependencies | No | The paper states 'Our code is implemented on fairseq [Ott et al., 2019]', but does not specify a version number for fairseq or any other software dependencies. |
| Experiment Setup | Yes | For IWSLT tasks, we use 6 Transformer blocks, where attention heads, hidden size and filter size are 4, 512 and 1024. Dropout is set as 0.3... For WMT tasks, we use 6 Transformer blocks, where attention heads, hidden size and filter size are 16, 1024 and 4096. ... Dropout is set as 0.3 and 0.2... For the decay function of sampling probability, we set a, b and µ as 30,000, 0.85 and 5,000. The λ for LECM is tuned on the valid set, and a optimal choice is 1.0. During training, we use Adam as the default optimizer, with a linear decay of learning rate. ... The beam size and length penalty is set as 5 and 1.0 for each task except WMT14 En De, which use a beam size of 4 and length penalty is 0.6. |