Lexical-Constraint-Aware Neural Machine Translation via Data Augmentation
Authors: Guanhua Chen, Yun Chen, Yong Wang, Victor O.K. Li
IJCAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on several language pairs demonstrate that our approach achieves superior translation results over the existing systems, improving translation of constrained sentences without hurting the unconstrained ones. |
| Researcher Affiliation | Academia | 1The University of Hong Kong 2Shanghai University of Finance and Economics {ghchen, wangyong, vli}@eee.hku.hk, yunchen@sufe.edu.cn |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | 1Our code is available at https://github.com/ghchen18/leca. |
| Open Datasets | Yes | For the De-En task, we use WMT16 news data as training corpus... For the Zh-En task, we use 1.25M parallel sentences extracted from NIST corpora2 as the training data. 2The corpora include LDC2002E18, LDC2003E07, LDC2003E14, LDC2004T07, LDC2004T08 and LDC2005T06 |
| Dataset Splits | Yes | For the De-En task, we use WMT16 news data as training corpus, newstest2013 as the development set and newstest2014 as the test set. For the Zh-En task, we use 1.25M parallel sentences extracted from NIST corpora2 as the training data. The NIST MT04 dataset serves as the development set, and a combination of NIST MT02, 03, 05, 06, 08 dataset serve as the test set. |
| Hardware Specification | Yes | The decoding speed is tested on a single Ge Force RTX 2080 Ti GPU and is averaged over five runs. |
| Software Dependencies | No | The paper mentions using 'fairseq' but does not specify a version number for it or any other software libraries. |
| Experiment Setup | Yes | We use the base Transformer model described in Vaswani et al. [2017] but share all embeddings. The maximum number of constrained phrases is set as 50. We use Adam [Kingma and Ba, 2015] and label smoothing for training. The learning rate is 0.0005 and warmup step is 16000. All the drop-out probabilities are set to 0.3. Maximum update number is 100k for the De-En language pair and 60k for the Zh-En language pair. We use beam search with a beam size of 10. |