Unsupervised Neural Machine Translation with SMT as Posterior Regularization
Authors: Shuo Ren, Zhirui Zhang, Shujie Liu, Ming Zhou, Shuai Ma241-248
AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments conducted on en-fr and en-de translation tasks show that our method significantly outperforms the strong baseline (Lample et al. 2018) and achieves the new state-of-the-art translation performance in unsupervised machine translation. |
| Researcher Affiliation | Collaboration | Shuo Ren,1 Zhirui Zhang,2 Shujie Liu,3 Ming Zhou,3 Shuai Ma1 1SKLSDE Lab, Beihang University 1Beijing Advanced Innovation Center for Big Data and Brain Computing, China 2University of Science and Technology of China, Hefei, China Ming Zhou,3Microsoft Research Asia |
| Pseudocode | Yes | Algorithm 1: Unsupervised NMT with SMT as PR |
| Open Source Code | No | The paper does not provide an explicit statement or link to its own open-source code for the methodology described. |
| Open Datasets | Yes | For each language, we use 50 million monolingual sentences in News Crawl, a monolingual dataset from WMT, which is the same as the previous work (Artetxe et al. 2017; Lample et al. 2018). |
| Dataset Splits | No | The paper mentions 'newstest 2014' and 'newstest 2016' as test sets but does not specify explicit training/validation/test splits, nor does it provide details about a validation set used for their NMT models. |
| Hardware Specification | No | The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | The paper mentions several software tools like Moses, word2vec, vecmap, Transformer (via tensor2tensor), and Salm, but does not provide specific version numbers for any of them. |
| Experiment Setup | Yes | We share the vocabulary space of 50,000 BPE codes (Sennrich, Haddow, and Birch 2015) for source and target languages. For each language pair, we train two independent NMT models for different translation directions (i.e., source to target and target to source) with shared embedding layers of source and target sides. ... In that stage, there are three hyper parameters described in 3.2 that should be taken into account, i.e., the peakiness controller λ, the vocabulary size S or T, and the number of translation candidates k for each word. |