Improved English to Russian Translation by Neural Suffix Prediction
Authors: Kai Song, Yue Zhang, Min Zhang, Weihua Luo
AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically study this method and compare it with previous work on reducing OOV rates (Sennrich, Haddow, and Birch, 2015b; Lee, Cho, and Hofmann, 2016). Results show that our method gives significant improvement on the English to Russian translation task on two different domains and two popular NMT architectures. We also verify our method on training data consisting of 50M bilingual sentences, which proves that this method works effectively on large-scale corpora. Experiments We run our experiments on English to Russian (En-RU) data under two significantly different domain, namely the news domain and the e-commerce domain. We verify our method on both RNN based NMT architecture and Transformer based NMT architecture. |
| Researcher Affiliation | Collaboration | Kai Song,1,2 Yue Zhang,3 Min Zhang,1 Weihua Luo,2 1 Soochow University, Suzhou, China 2 Alibaba Group, Hangzhou, China 3 Singapore University of Technology and Design, Singapore |
| Pseudocode | No | The paper describes its methods using mathematical equations and textual descriptions, but does not include structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper mentions a third-party baseline system's code availability at 'https://github.com/nyu-dl/dl4mt-c2c' but does not provide concrete access to the source code for the methodology described in this paper. |
| Open Datasets | Yes | We select 5.3M sentences from the bilingual training corpus released by WMT2017 shared task on the news translation domain1 as our training data. We use 3 test set, which are published by WMT2017 news translation task, namely News2014 , News2015 , News2016 . 1http://www.statmt.org/wmt17/translation-task.html |
| Dataset Splits | No | The paper does not provide specific details about training/test/validation dataset splits, such as percentages or sample counts for each split. It only explicitly mentions 'test set' for evaluation. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments. It only mentions general concepts like 'graphics memory size' and 'GPU memory limitation'. |
| Software Dependencies | No | The paper mentions software like 'Adam' for optimization and 'snowball' for stemming, but it does not provide specific version numbers for these or other software dependencies. |
| Experiment Setup | Yes | In our experiments, we set λ to 0.1 empirically. We use Adam (Kingma and Ba 2014) as our optimizing function. Beam search is adopted as our decoding algorithm. At each time step, the search space can be infeasible large if we take all the combinations of stems and suffixes into consideration. So we use cube pruning (Huang and Chiang 2007) to obtain n-best candidates. ... We limit the source and target vocabularies to the most frequent 30K tokens for both English and Russian. ... When selecting our training data, we keep the sentences which has length between 1 to 30. |