Towards Neural Phrase-based Machine Translation
Authors: Po-Sen Huang, Chong Wang, Sitao Huang, Dengyong Zhou, Li Deng
ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments show that NPMT achieves superior performances on IWSLT 2014 German-English/English German and IWSLT 2015 English-Vietnamese machine translation tasks compared with strong NMT baselines. In this section, we evaluate our model on the IWSLT 2014 German-English (Cettolo et al., 2014), IWSLT 2014 English-German, and IWSLT 2015 English-Vietnamese (Cettolo et al., 2015) machine translation tasks. |
| Researcher Affiliation | Collaboration | Microsoft Research, Google, University of Illinois at Urbana-Champaign, Citadel pshuang@microsoft.com, {chongw, dennyzhou}@google.com, shuang91@illinois.edu, l.deng@ieee.org |
| Pseudocode | No | The paper describes algorithmic details in text, such as 'SWAN can be also understood via a generative model' and 'Greedy decoding for SWAN is straightforward,' but it does not include any explicitly labeled 'Pseudocode' or 'Algorithm' block or figure. |
| Open Source Code | Yes | 1The source code is available at https://github.com/posenhuang/NPMT. |
| Open Datasets | Yes | In this section, we evaluate our model on the IWSLT 2014 German-English (Cettolo et al., 2014), IWSLT 2014 English-German, and IWSLT 2015 English-Vietnamese (Cettolo et al., 2015) machine translation tasks. |
| Dataset Splits | Yes | The data comes from translated TED talks, and the dataset contains roughly 153K training sentences, 7K development sentences, and 7K test sentences. We use the same preprocessing and dataset splits as in Ranzato et al. (2015); Wiseman & Rush (2016); Bahdanau et al. (2017). We use the TED tst2012 (1553 sentences) as a validation set for hyperparameter tuning and TED tst2013 (1268 sentences) as a test set. |
| Hardware Specification | Yes | NPMT takes about 2 3 days to run to convergence (40 epochs) on a machine with four M40 GPUs. NPMT takes about one day to run to convergence (15 epochs) on a machine with 4 M40 GPUs. |
| Software Dependencies | No | The paper mentions the use of 'Adam algorithm (Kingma & Ba, 2014)' for optimization and 'Ken LM implementation (Heafield et al., 2013)' for language modeling, but it does not specify version numbers for any software libraries or frameworks used in the implementation (e.g., TensorFlow, PyTorch, or specific versions of Ken LM). |
| Experiment Setup | Yes | We report our IWSLT 2014 German-English experiments using one reordering layer with window size 7, two layers of bi-directional GRU encoder (Gated recurrent unit, Chung et al. (2014)) with 256 hidden units, and two layers of unidirectional GRU decoder with 512 hidden units. We add dropout with a rate of 0.5 in the GRU layer. We choose GRU since baselines for comparisons were using GRU. The maximum segment length is set to 6. Batch size is set as 32 (per GPU) and the Adam algorithm (Kingma & Ba, 2014) is used for optimization with an initial learning rate of 0.001. For decoding, we use greedy search and beam search with a beam size of 10. |