Joint Training for Pivot-based Neural Machine Translation
Authors: Yong Cheng, Qian Yang, Yang Liu, Maosong Sun, Wei Xu
IJCAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on Europarl and WMT corpora show that joint training of source-to-pivot and pivot-to-target models leads to significant improvements over independent training across various languages. |
| Researcher Affiliation | Academia | #Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, China State Key Laboratory of Intelligent Technology and Systems Tsinghua National Laboratory for Information Science and Technology Department of Computer Science and Technology, Tsinghua University, Beijing, China |
| Pseudocode | No | The paper describes mathematical formulations and algorithmic steps in text but does not include a clearly labeled pseudocode or algorithm block. |
| Open Source Code | No | The paper does not include any statement or link indicating that the source code for the methodology is openly available. |
| Open Datasets | Yes | Table 1 shows the statistics of the Europarl and WMT corpora used in our experiments. ... The WMT corpus is composed of the Common Crawl, News Commentary, Europarl v7 and UN corpora. |
| Dataset Splits | Yes | The WMT 2006 shared task datasets are used as the development and test sets. ... The newstest2011 and newstest2012 datasets serve as development and test sets. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running the experiments (e.g., CPU, GPU models, memory). |
| Software Dependencies | No | The paper mentions software like "tokenize.perl", "multi-bleu.perl", "RNNSEARCH [Bahdanau et al., 2015]", and "byte pair encoding [Sennrich et al., 2016b]" but does not provide specific version numbers for these software components or libraries. |
| Experiment Setup | Yes | We set the vocabulary size of all the languages to 30K... We set the size of sub-words to 43K, 33K, and 43K... We set the hyper-parameter λ for balancing between likelihood and the connection term to 1.0. The threshold of gradients is set to 0.1... We set k to 10 for calculating top-k lists. |