Rephrasing the Reference for Non-autoregressive Machine Translation
Authors: Chenze Shao, Jinchao Zhang, Jie Zhou, Yang Feng
AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on major WMT benchmarks and NAT baselines show that our approach consistently improves the translation quality of NAT. |
| Researcher Affiliation | Collaboration | Chenze Shao1,2, Jinchao Zhang3, Jie Zhou3, Yang Feng1,2 1 Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences 2 University of Chinese Academy of Sciences 3 Pattern Recognition Center, We Chat AI, Tencent Inc, China {shaochenze18z,fengyang}@ict.ac.cn, {dayerzhang,withtomzhou}@tencent.com |
| Pseudocode | No | No pseudocode or algorithm blocks found in the paper. |
| Open Source Code | Yes | Reproducible code: https://github.com/ictnlp/Rephraser-NAT. |
| Open Datasets | Yes | We conducted experiments on major benchmarking datasets in previous NAT studies: WMT14 English German (En De, 4.5M sentence pairs) and WMT16 English Romanian (En Ro, 0.6M sentence pairs). |
| Dataset Splits | Yes | For WMT14 En De, the validation set is newstest2013 and the test set is newstest2014. For WMT16 En Ro, the validation set is newsdev-2016 and the test set is newstest-2016. |
| Hardware Specification | Yes | We use the Ge Force RTX 3090 GPU to train models and measure the translation latency. |
| Software Dependencies | No | The paper mentions optimizers (Adam) and techniques (BPE) but does not provide specific software names with version numbers (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | We set αmax to 0.75, αmin to 0.5, the sampling times K to 2, and the depth of rephraser Nr to 2 across all datasets. For CTC, the length for decoder inputs is 3 as long as the source length. All models are optimized with Adam (Kingma and Ba 2014) with β = (0.9, 0.98) and ϵ = 10 8. For vanilla NAT and CTC, each batch contains approximately 64K source words. For CMLM, (...) we use the batch size 128K and use 5 length candidates for inference. All models are pre-trained for 300K steps and fine-tuned for 30K steps. During the fine-tuning, we measure validation BLEU for every 500 steps and average the 5 best checkpoints to obtain the final model. |