Improving Context-Aware Neural Machine Translation with Source-side Monolingual Documents

Authors: Linqing Chen, Junhui Li, Zhengxian Gong, Xiangyu Duan, Boxing Chen, Weihua Luo, Min Zhang, Guodong Zhou

IJCAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate the effectiveness and generality of our pre-trained PGC model by adapting it to various downstream context-aware NMT models. Detailed experimentation on four different translation tasks demonstrates that our PGC approach significantly improves the translation performance of context-aware NMT.
Researcher Affiliation Collaboration Linqing Chen1 , Junhui Li1 , Zhengxian Gong1 , Xiangyu Duan1 , Boxing Chen2 , Weihua Luo2 , Min Zhang1 and Guodong Zhou1 1School of Computer Science and Technology, Soochow University, Suzhou, China 2Alibaba DAMO Academy
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks. Figure 3 is an illustration of the model architecture, not pseudocode.
Open Source Code No The paper does not provide any statement about releasing source code or a link to a code repository.
Open Datasets Yes For ZH-EN, the document-level parallel corpus of training set include 41K documents with 780K sentence pairs.4 We use the NIST MT 2006 dataset as the development set, and the NIST MT 02, 03, 04, 05, 08 datasets as test sets. The Chinese sentences are segmented by Jieba while the English sentences are tokenized and lowercased by Moses scripts. For EN-ES, the training set is from IWSLT 2014 and 2015 while the development set is dev2010 and the test set is test2010, test2011, and test2012.5 For EN-DE (TED), the training set is from IWSLT 2017. We use test2016 and test2017 as our test set while the other as the development set. For EN-DE (News), the training set is News Commentary v11 corpus,6 while the development set is news-test2015 and the test set is newstest2016.
Dataset Splits Yes We use the NIST MT 2006 dataset as the development set, and the NIST MT 02, 03, 04, 05, 08 datasets as test sets. For EN-ES, the training set is from IWSLT 2014 and 2015 while the development set is dev2010 and the test set is test2010, test2011, and test2012. For EN-DE (TED), the training set is from IWSLT 2017. We use test2016 and test2017 as our test set while the other as the development set.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments. It only mentions training time: 'With 500K training steps, we complete 3.0 and 1.2 passes over the pre-training data within 70 and 75 hours for Chinese and English, respectively.'
Software Dependencies No The paper mentions 'Open NMT' and 'Transformer', 'Jieba', and 'Moses scripts', but does not provide specific version numbers for any of these software components.
Experiment Setup Yes For all pre-trained and translation models, the numbers of layers in the context encoder, sentence encoder and decoder (i.e., Ng, Ne, and Nd in Figure 3) to 4, 6, 6, respectively. In inferring, we set beam size to 5.