Neural Machine Translation with Reconstruction
Authors: Zhaopeng Tu, Yang Liu, Lifeng Shang, Xiaohua Liu, Hang Li
AAAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments show that the proposed framework significantly improves the adequacy of NMT output and achieves superior translation result over state-of-theart NMT and statistical MT systems. |
| Researcher Affiliation | Collaboration | Noah s Ark Lab, Huawei Technologies, Hong Kong {tu.zhaopeng,shang.lifeng,liuxiaohua3,hangli.hl}@huawei.com Department of Computer Science and Technology, Tsinghua University, Beijing liuyang2011@tsinghua.edu.cn |
| Pseudocode | No | No pseudocode or algorithm blocks are provided in the paper. |
| Open Source Code | No | No concrete statement or link regarding the availability of source code for the described methodology is provided in the paper. |
| Open Datasets | Yes | The training dataset consists of 1.25M sentence pairs extracted from LDC corpora, with 27.9M Chinese words and 34.5M English words respectively.2 The corpora include LDC2002E18, LDC2003E07, LDC2003E14, LDC2004T07, LDC2004T08 and LDC2005T06. |
| Dataset Splits | Yes | We choose the NIST 2002 (MT02) dataset as validation set, and the NIST 2005 (MT05), 2006 (MT06) and 2008 (MT08) datasets as test sets. |
| Hardware Specification | Yes | For training, when running on a single GPU device Tesla K80, the speed of the baseline model is 960 target words per second, while the speed of the proposed model is 500 target words per second. |
| Software Dependencies | No | The paper mentions specific tools and optimizers like 'MOSES', 'RNNSEARCH', and 'Adadelta', but it does not specify version numbers for any software dependencies, such as Python, PyTorch, or TensorFlow. |
| Experiment Setup | Yes | The word embedding dimension is 620 and the hidden layer dimension is 1000. We train for 15 epochs using Adadelta (Zeiler 2012). For our model, we use the same setting as RNNSEARCH if applicable. We set the hyper-parameter λ = 1. The parameters of our model (i.e., encoder and decoder, except those related to reconstructor) are initialized by the RNNSEARCH model trained on a parallel corpus. We further train all the parameters of our model for another 10 epochs. |