IntroVNMT: An Introspective Model for Variational Neural Machine Translation

Authors: Xin Sheng, Linli Xu, Junliang Guo, Jingchang Liu, Ruoyu Zhao, Yinlong Xu8830-8837

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results on different translation tasks demonstrate that the proposed model can achieve significant improvements over the vanilla variational NMT model.
Researcher Affiliation Academia 1School of Computer Science and Technology, University of Science and Technology of China 2Department of Computer Science and Engineering, Hong Kong University of Science and Technology
Pseudocode Yes Algorithm 1: Training Process of Intro VNMT
Open Source Code No The paper does not provide any explicit statement or link to the source code for the methodology described.
Open Datasets Yes For the EN-DE translation task, we use the same datasets as (Zhang et al. 2016). Our training set1 consists of 4.45M sentence pairs with 116.1M English words and 108.9M German words. We use news-test 2013 as the validation set and news-test 2015 as the test set. For the DE-EN translation task, we select the dataset from the IWSLT 2014 evaluation campaign (Cettolo et al. 2014), consisting of training/validation/test corpus with approximately 153K, 7K and 6.5K bilingual sentence pairs respectively. The preprocessed data can be found and downloaded from http://nlp.stanford.edu/projects/nmt/
Dataset Splits Yes We use news-test 2013 as the validation set and news-test 2015 as the test set. For the DE-EN translation task, we select the dataset from the IWSLT 2014 evaluation campaign (Cettolo et al. 2014), consisting of training/validation/test corpus with approximately 153K, 7K and 6.5K bilingual sentence pairs respectively.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments.
Software Dependencies No The paper mentions "RNNSearch" and "Adam algorithm (Kingma and Ba 2015)" but does not specify version numbers for any key software components or libraries.
Experiment Setup Yes For the basic hyper-parameters, we set the word embedding dimension as 620, hidden layer size as 1000, learning rate as 1 10 4, batch size as 80, gradient norm as 1.0 and dropout rate as 0.3 (Srivastava et al. 2014). As implemented in the VAE framework, we set the sampling number L = 1 and the dimension of the latent variable as 2000. During decoding, we adopt the beam search algorithm (Sutskever, Vinyals, and Le 2014) and set the beam size as 10 for all models. For Intro VNMT, we set m = 100, α = 1, β = 1 and γ = 1 as the default parameters respectively. The Inferer E and Decoder D are trained iteratively using the Adam algorithm (Kingma and Ba 2015) (β1 = 0.9, β2 = 0.999).