Search Engine Guided Neural Machine Translation

Authors: Jiatao Gu, Yong Wang, Kyunghyun Cho, Victor O.K. Li

AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical evaluation on three language pairs (En-Fr, En-De, and En-Es) shows that the proposed approach significantly outperforms the baseline approach and the improvement is more significant when more relevant sentence pairs were retrieved.
Researcher Affiliation Academia The University of Hong Kong New York University, CIFAR Azrieli Global Scholar
Pseudocode Yes Algorithm 1 Greedy selection procedure to maximize the coverage of the source symbols. and Algorithm 2 Learning for SEG-NMT
Open Source Code No The paper mentions using Apache Lucene and provides its URL, but does not state that the authors' own code for the described methodology is open-source.
Open Datasets Yes We use the JRC-Acquis corpus(Steinberger et al. 2006) for evaluating the proposed SEG-NMT model.3 The JRC-Acquis corpus consists of the total body of European Union (EU) law applicable to the member states. 3http://optima.jrc.it/Acquis/JRC-Acquis.3.0/corpus/
Dataset Splits Yes For each language pair, we uniformly select 3000 sentence pairs at random for both the development and test sets. The rest is used as a training set, after removing any sentence which contains special characters only.
Hardware Specification No The paper does not explicitly describe the specific hardware (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies No The paper mentions software like Apache Lucene, Adam optimizer, GRU, and BPE, but does not specify version numbers for any of the key software components used for implementation (e.g., PyTorch, Python versions).
Experiment Setup Yes We use a standard attention-based neural machine translation model(Bahdanau, Cho, and Bengio 2014) with 1,024 gated recurrent units(GRU)(Cho et al. 2014) on each of the encoder and decoder. We train both the vanilla model as well as the proposed SEG-NMT based on this configuration from scratch using Adam(Kingma and Ba 2014) with the initial learning rate set to 0.001. We use a minibatch of up to 32 sentence pairs. For evaluation, we use beam search with width set to 5. In the case of the proposed SEG-NMT, we parametrize the metric matrix M in the similarity score function from Eq. (7) to be diagonal and initialized to an identity matrix. λ in Eq. (7) is initialized to 0. The gating network fgate is a feedforward network with a single hidden layer, just like the attention mechanism fatt.