reproducibilityindex.ai

Generative Neural Machine Translation

Authors: Harshil Shah, David Barber

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section we evaluate the effectiveness of GNMT and GNMT-MULTI on the 6 permutations of language pairs between English (EN), Spanish (ES) and French (FR) i.e. EN ES, ES EN, EN FR, etc. We also train GNMT-MULTI in a semi-supervised manner, as described in section 2.6, and refer to this as GNMT-MULTI-SSL. We compare the performance of GNMT, GNMT-MULTI, and GNMT-MULTI-SSL against that of VNMT, which we believe to be the most closely related model to our work.
Researcher Affiliation	Collaboration	Harshil Shah1 David Barber1,2,3 1University College London 2Alan Turing Institute 3reinfer.io
Pseudocode	Yes	Algorithm 1 Generating translations; Algorithm 2 Translating when there are missing words
Open Source Code	No	No explicit statement or link for open-source code release is provided.
Open Datasets	Yes	We use paired data provided by the Multi UN corpus1[Tiedemann, 2012]. [...] For the monolingual data used to train GNMT-MULTI-SSL, we use the News Crawl articles from 2009 to 2012, provided for the WMT 13 translation task.
Dataset Splits	Yes	We train each model with a small, medium and large amount of paired data, corresponding to 40K, 400K and 4M paired sentences respectively. For each language pair, we create validation sets of size 5K and test sets of size 10K paired sentences respectively.
Hardware Specification	No	No specific hardware details (e.g., GPU/CPU models, memory) are provided for the experimental setup.
Software Dependencies	No	We implement both models in Python, using the Theano [Theano Development Team, 2016] and Lasagne [Dieleman et al., 2015] libraries.
Experiment Setup	Yes	The latent representation z has 100 units, each of the RNN hidden states has 1,000 units, and the word embeddings are 300-dimensional. [...] KL divergence annealing We multiply the KL divergence term by a constant weight, which we linearly anneal from 0 to 1 over the ﬁrst 50,000 iterations of training [...] Word dropout [...] This is parameterized by a drop rate, which we set to 30%.