Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Mirror-Generative Neural Machine Translation
Authors: Zaixiang Zheng, Hao Zhou, Shujian Huang, Lei Li, Xin-Yu Dai, Jiajun Chen
ICLR 2020 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments show that the proposed MGNMT consistently outperforms existing approaches in a variety of language pairs and scenarios, including resource-rich and low-resource situations. |
| Researcher Affiliation | Collaboration | 1National Key Laboratory for Novel Software Technology, Nanjing University EMAIL,EMAIL 2Byte Dance AI Lab EMAIL |
| Pseudocode | Yes | Algorithm 1 Training MGNMT from Non-Parallel Data; Algorithm 2 MGNMT Decoding with EM Algorithm |
| Open Source Code | No | The paper does not contain any explicit statement or link providing concrete access to the source code for the proposed MGNMT methodology. |
| Open Datasets | Yes | Dataset To evaluate our model in resource-poor scenarios, we conducted experiments on WMT16 English-to/from-Romanian (WMT16 EN RO) translation task... As for resource-rich scenarios, we conducted experiments on WMT14 English-to/from German (WMT14 EN DE), NIST English-to/from-Chinese (NIST EN ZH) translation tasks. For all the languages, we use the non-parallel data from News Crawl, except for NIST EN ZH, where the Chinese monolingual data were extracted from LDC corpus. |
| Dataset Splits | Yes | Dev/Test newstest2013/14 MT06/MT03 newstest2015/16 tst13/14&newstest2014 (Table 1 caption). Also, Table 2 lists our best setting of KL-annealing for each task on the development sets. |
| Hardware Specification | Yes | We trained our models on a single GTX 1080ti GPU. |
| Software Dependencies | No | We implemented our models on the top of Transformer (Vaswani et al., 2017) and RNMT (Bahdanau et al., 2015) and GNMT (Shah & Barber, 2018) as well on Pytorch3. (Footnote 3 mentions PyTorch, but without a version). |
| Experiment Setup | Yes | For all languages pairs, sentence were encoded using byte pair encoding (Sennrich et al., 2016a, BPE) with 32k merge operations... We used the Adam optimizer (Kingma & Ba, 2014) with the same learning rate schedule strategy as Vaswani et al. (2017) with 4k warmup steps. Each mini-batch consists of about 4,096 source and target tokens respectively... For all experiments, word dropout rates were set to a constant of 0.3. Honestly, annealing KL weight is somewhat tricky. Table 2 lists our best setting of KL-annealing for each task on the development sets. |