reproducibilityindex.ai

Mirror-Generative Neural Machine Translation

Authors: Zaixiang Zheng, Hao Zhou, Shujian Huang, Lei Li, Xin-Yu Dai, Jiajun Chen

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments show that the proposed MGNMT consistently outperforms existing approaches in a variety of language pairs and scenarios, including resource-rich and low-resource situations.
Researcher Affiliation	Collaboration	1National Key Laboratory for Novel Software Technology, Nanjing University zhengzx@smail.nju.edu.cn,{huangsj,daixinyu,chenjj}@nju.edu.cn 2Byte Dance AI Lab {zhouhao.nlp,lileilab}@bytedance.com
Pseudocode	Yes	Algorithm 1 Training MGNMT from Non-Parallel Data; Algorithm 2 MGNMT Decoding with EM Algorithm
Open Source Code	No	The paper does not contain any explicit statement or link providing concrete access to the source code for the proposed MGNMT methodology.
Open Datasets	Yes	Dataset To evaluate our model in resource-poor scenarios, we conducted experiments on WMT16 English-to/from-Romanian (WMT16 EN RO) translation task... As for resource-rich scenarios, we conducted experiments on WMT14 English-to/from German (WMT14 EN DE), NIST English-to/from-Chinese (NIST EN ZH) translation tasks. For all the languages, we use the non-parallel data from News Crawl, except for NIST EN ZH, where the Chinese monolingual data were extracted from LDC corpus.
Dataset Splits	Yes	Dev/Test newstest2013/14 MT06/MT03 newstest2015/16 tst13/14&newstest2014 (Table 1 caption). Also, Table 2 lists our best setting of KL-annealing for each task on the development sets.
Hardware Specification	Yes	We trained our models on a single GTX 1080ti GPU.
Software Dependencies	No	We implemented our models on the top of Transformer (Vaswani et al., 2017) and RNMT (Bahdanau et al., 2015) and GNMT (Shah & Barber, 2018) as well on Pytorch3. (Footnote 3 mentions PyTorch, but without a version).
Experiment Setup	Yes	For all languages pairs, sentence were encoded using byte pair encoding (Sennrich et al., 2016a, BPE) with 32k merge operations... We used the Adam optimizer (Kingma & Ba, 2014) with the same learning rate schedule strategy as Vaswani et al. (2017) with 4k warmup steps. Each mini-batch consists of about 4,096 source and target tokens respectively... For all experiments, word dropout rates were set to a constant of 0.3. Honestly, annealing KL weight is somewhat tricky. Table 2 lists our best setting of KL-annealing for each task on the development sets.