Lexical-Constraint-Aware Neural Machine Translation via Data Augmentation

Authors: Guanhua Chen, Yun Chen, Yong Wang, Victor O.K. Li

IJCAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on several language pairs demonstrate that our approach achieves superior translation results over the existing systems, improving translation of constrained sentences without hurting the unconstrained ones.
Researcher Affiliation Academia 1The University of Hong Kong 2Shanghai University of Finance and Economics {ghchen, wangyong, vli}@eee.hku.hk, yunchen@sufe.edu.cn
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code Yes 1Our code is available at https://github.com/ghchen18/leca.
Open Datasets Yes For the De-En task, we use WMT16 news data as training corpus... For the Zh-En task, we use 1.25M parallel sentences extracted from NIST corpora2 as the training data. 2The corpora include LDC2002E18, LDC2003E07, LDC2003E14, LDC2004T07, LDC2004T08 and LDC2005T06
Dataset Splits Yes For the De-En task, we use WMT16 news data as training corpus, newstest2013 as the development set and newstest2014 as the test set. For the Zh-En task, we use 1.25M parallel sentences extracted from NIST corpora2 as the training data. The NIST MT04 dataset serves as the development set, and a combination of NIST MT02, 03, 05, 06, 08 dataset serve as the test set.
Hardware Specification Yes The decoding speed is tested on a single Ge Force RTX 2080 Ti GPU and is averaged over five runs.
Software Dependencies No The paper mentions using 'fairseq' but does not specify a version number for it or any other software libraries.
Experiment Setup Yes We use the base Transformer model described in Vaswani et al. [2017] but share all embeddings. The maximum number of constrained phrases is set as 50. We use Adam [Kingma and Ba, 2015] and label smoothing for training. The learning rate is 0.0005 and warmup step is 16000. All the drop-out probabilities are set to 0.3. Maximum update number is 100k for the De-En language pair and 60k for the Zh-En language pair. We use beam search with a beam size of 10.