Modeling Fluency and Faithfulness for Diverse Neural Machine Translation

Authors: Yang Feng, Wanying Xie, Shuhao Gu, Chenze Shao, Wen Zhang, Zhengxin Yang, Dong Yu59-66

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The experiments on multiple translation tasks show that our method can achieve significant improvements over strong baselines.
Researcher Affiliation Collaboration 1Key Laboratory of Intelligent Information Processing Institute of Computing Technology, Chinese Academy of Sciences (ICT/CAS) 2University of Chinese Academy of Sciences, Beijing, China 3Beijing Language and Culture University, China 4Smart Platform Product Department of Tencent Inc., China
Pseudocode No The whole architecture is shown in Figure 1.
Open Source Code Yes Our code can be got at https://github.com/ictnlp/Diverse NMT
Open Datasets Yes CN EN The training data consists of 1.25M sentence pairs from LDC corpora which has 27.9M Chinese words and 34.5M English words respectively 1. The corpora include LDC2002E18, LDC2003E07, LDC2003E14, Hansards portion of LDC2004T07, LDC2004T08 and LDC2005T06. EN DE The training data is from WMT2014 which consists about 4.5M sentences pairs with 118M English words and 111M German words. EN RO We used the preprocessed version of WMT16 English-Romanian dataset released by Lee, Mansimov, and Cho (2018) which includes 0.6M sentence pairs.
Dataset Splits Yes The data set MT02 is used as validation and MT03, MT04, MT05, MT06, MT08 are used for test. We chose the news test-2013 for validation and news-test 2014 for test. We use news-dev 2016 for validation and news-test 2016 for test.
Hardware Specification No All the Transformer-based systems have the same configuration as the base model described in Vaswani et al. (2017).
Software Dependencies No An open-source toolkit called Fairseqpy released by Facebook (Edunov, Ott, and Gross 2017) which was implemented strictly referring to Vaswani et al. (2017).
Experiment Setup Yes All the Transformer-based systems have the same configuration as the base model described in Vaswani et al. (2017). The translation quality was evaluated using the multibleu.pl scipt (Papineni et al. ) based on case-insensitive ngram matching with n up to 4. For the evaluation module, the fluency part is composed of a stack of N = 6 layers. In the training, we first pretrain the translation and evaluation modules together with the loss Lpretrain = Lt + Le. Near convergency, we introduce Lc and fine tune the model with the loss in Equation 14.