Search Engine Guided Neural Machine Translation
Authors: Jiatao Gu, Yong Wang, Kyunghyun Cho, Victor O.K. Li
AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical evaluation on three language pairs (En-Fr, En-De, and En-Es) shows that the proposed approach significantly outperforms the baseline approach and the improvement is more significant when more relevant sentence pairs were retrieved. |
| Researcher Affiliation | Academia | The University of Hong Kong New York University, CIFAR Azrieli Global Scholar |
| Pseudocode | Yes | Algorithm 1 Greedy selection procedure to maximize the coverage of the source symbols. and Algorithm 2 Learning for SEG-NMT |
| Open Source Code | No | The paper mentions using Apache Lucene and provides its URL, but does not state that the authors' own code for the described methodology is open-source. |
| Open Datasets | Yes | We use the JRC-Acquis corpus(Steinberger et al. 2006) for evaluating the proposed SEG-NMT model.3 The JRC-Acquis corpus consists of the total body of European Union (EU) law applicable to the member states. 3http://optima.jrc.it/Acquis/JRC-Acquis.3.0/corpus/ |
| Dataset Splits | Yes | For each language pair, we uniformly select 3000 sentence pairs at random for both the development and test sets. The rest is used as a training set, after removing any sentence which contains special characters only. |
| Hardware Specification | No | The paper does not explicitly describe the specific hardware (e.g., GPU/CPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions software like Apache Lucene, Adam optimizer, GRU, and BPE, but does not specify version numbers for any of the key software components used for implementation (e.g., PyTorch, Python versions). |
| Experiment Setup | Yes | We use a standard attention-based neural machine translation model(Bahdanau, Cho, and Bengio 2014) with 1,024 gated recurrent units(GRU)(Cho et al. 2014) on each of the encoder and decoder. We train both the vanilla model as well as the proposed SEG-NMT based on this configuration from scratch using Adam(Kingma and Ba 2014) with the initial learning rate set to 0.001. We use a minibatch of up to 32 sentence pairs. For evaluation, we use beam search with width set to 5. In the case of the proposed SEG-NMT, we parametrize the metric matrix M in the similarity score function from Eq. (7) to be diagonal and initialized to an identity matrix. λ in Eq. (7) is initialized to 0. The gating network fgate is a feedforward network with a single hidden layer, just like the attention mechanism fatt. |