Neural Machine Translation with Adequacy-Oriented Learning

Authors: Xiang Kong, Zhaopeng Tu, Shuming Shi, Eduard Hovy, Tong Zhang6618-6625

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct experiments on Chinese English and German English translation tasks, using both the RNN-based NMT model (Bahdanau, Cho, and Bengio 2015) and the recently proposed TRANSFORMER (Vaswani et al. 2017). The consistent improvements across language pairs and NMT architectures demonstrate the effectiveness and universality of the proposed approach.
Researcher Affiliation Collaboration Xiang Kong Carnegie Mellon University xiangk@andrew.cmu.edu; Zhaopeng Tu Tencent AI Lab zptu@tencent.com; Shuming Shi Tencent AI Lab shumingshi@tencent.com; Eduard Hovy Carnegie Mellon University hovy@cs.cmu.edu; Tong Zhang Tencent AI Lab bradymzhang@tencent.com
Pseudocode No The paper describes algorithms but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code No The paper mentions implementing their approach on an open-source toolkit (THMUT) for the TRANSFORMER model, but does not state that their own code for the proposed method is open-sourced or provide a link.
Open Datasets Yes We conduct experiments on the widely-used Chinese (Zh) English (En) and German (De) English (En) translation tasks. For Zh En translation, the training corpus contains 1.25M sentence pairs extracted from LDC corpora. NIST 2002 (MT02) dataset is the validation set and the test data consists of NIST 2003 (MT03), NIST2004 (MT04), NIST 2005 (MT05) and NIST 2006(MT06). For De En translation, to compare with the results reported by previous work (Shen et al. 2016; Bahdanau et al. 2017; Wu et al. 2017; Vaswani et al. 2017), we use both the IWSLT 2014 and WMT 2014 data.
Dataset Splits Yes For Zh En translation, the training corpus contains 1.25M sentence pairs extracted from LDC corpora. NIST 2002 (MT02) dataset is the validation set and the test data consists of NIST 2003 (MT03), NIST2004 (MT04), NIST 2005 (MT05) and NIST 2006(MT06).
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments.
Software Dependencies No The paper mentions frameworks like RNNSEARCH and TRANSFORMER, and a toolkit called THMUT, but does not provide specific version numbers for any software dependencies.
Experiment Setup Yes The hyper-parameter α which could control the sharpness of the generator distribution in our system is 1e-4, which could also be regarded as a baseline to reduce the variance of the REINFORCE algorithm. We also randomly choose 50% minibatches trained with our objective function and the other with the MLE principle. In MRT training strategy (Shen et al. 2016), the sample size is 25, the hyper-parameter α is 5e-3 and the loss function is negative smoothed sentence-level BLEU. We validate our models on two representative model architectures, namely RNNSEARCH and TRANSFORMER. For the RNNSEARCH model, mini-batch size is 80, the word-embedding dimension is 620, and the hidden layer size is 1000. We use a neural coverage model for RNNSEARCHCOVERAGE and the dimensionality of coverage vector is 100. The baseline models are trained for 15 epochs, which are used as the initial generator in the proposed framework.