Towards Reliable Neural Machine Translation with Consistency-Aware Meta-Learning

Authors: Rongxiang Weng, Qiang Wang, Wensen Cheng, Changfeng Zhu, Min Zhang

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct experiments on the NIST Chinese to English task, three WMT translation tasks, and the TED M2O task. The results demonstrate that CONMT effectively improves overall translation quality and reliably handles diverse inputs.
Researcher Affiliation Collaboration Rongxiang Weng1,2, Qiang Wang3,4, Wensen Cheng2, Changfeng Zhu2, Min Zhang1 1Soochow University, Suzhou, China 2mi Ho Yo AI, Shanghai, China 3Zhejiang University, Hangzhou, China 4Royal Flush AI Research Institute, Hangzhou, China
Pseudocode Yes Algorithm 1: The training process of NMT with CAML
Open Source Code No The paper states "We implement our approach on fairseq2." and provides a link to fairseq (https://github.com/pytorch/fairseq), which is a third-party framework, not their specific implementation.
Open Datasets Yes We first conduct experiments on the NIST Chinese to English (Zh En) task and three widely-used WMT translation tasks: WMT14 English to German (En De), WMT16 Romanian to English (Ro En), and WMT18 Chinese to English (Zh En). ... Moreover, we evaluate our method on the TED multilingual machine translation task." and "Following Cheng et al. (2020), we sample 1.25M English sentences from the Xinhua portion of the gigaword corpus as the extra corpus to enhance the model.5" with footnote "5https://catalog.ldc.upenn.edu/LDC2003T05".
Dataset Splits Yes On the NIST Zh En task, we use NIST 2006 (MT06) as the dev set and NIST 2002 (MT02), 2003 (MT03), 2004 (MT04), 2005 (MT05), 2008 (MT08) as the test sets. On the En De task, we use newstest2013 as the dev set and newstest2014 as the test set. On the Ro En task, we use newstest2015 as the dev set and newstest2016 as the test set. On the Zh Ee task, we use newsdev2017 as the dev set, and newstest2017 as the test set.
Hardware Specification Yes We use 4 V100 GPUs and accumulate the gradient four iterations on the WMT En De and Zh En. Other tasks run on 2 V100 GPUs.
Software Dependencies Yes For a fair comparison, we calculate the case-sensitive tokenized BLEU with the multi-bleu.perl script for the NIST Zh En, and use the sacreBLEU34 to calculate case-sensitive BLEU (Papineni et al. 2002) for all WMT and TED tasks." and footnote 4: "BLEU+case.mixed+lang.${Src}-${Trg}+numrefs.1+smooth.exp+test.${Task}+tok.13a+version.1.5.1".
Experiment Setup Yes We set label smoothing as 0.1 and dropout rate as 0.1. The Adam is adopted as the optimizer, and the β1/β2 is set as 0.9/0.98 for the base setting and 0.9/0.998 for the big setting. The initial learning rate is 0.001. We adopt the warm-up strategy with 4000 steps. The learning rate γ of meta-train remains one-tenth of NMT. Other settings are followed as Vaswani et al. (2017). ... We use beam search as the decoding algorithm and set the beam size as 5 and the length penalty as 0.6.