IntroVNMT: An Introspective Model for Variational Neural Machine Translation
Authors: Xin Sheng, Linli Xu, Junliang Guo, Jingchang Liu, Ruoyu Zhao, Yinlong Xu8830-8837
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results on different translation tasks demonstrate that the proposed model can achieve significant improvements over the vanilla variational NMT model. |
| Researcher Affiliation | Academia | 1School of Computer Science and Technology, University of Science and Technology of China 2Department of Computer Science and Engineering, Hong Kong University of Science and Technology |
| Pseudocode | Yes | Algorithm 1: Training Process of Intro VNMT |
| Open Source Code | No | The paper does not provide any explicit statement or link to the source code for the methodology described. |
| Open Datasets | Yes | For the EN-DE translation task, we use the same datasets as (Zhang et al. 2016). Our training set1 consists of 4.45M sentence pairs with 116.1M English words and 108.9M German words. We use news-test 2013 as the validation set and news-test 2015 as the test set. For the DE-EN translation task, we select the dataset from the IWSLT 2014 evaluation campaign (Cettolo et al. 2014), consisting of training/validation/test corpus with approximately 153K, 7K and 6.5K bilingual sentence pairs respectively. The preprocessed data can be found and downloaded from http://nlp.stanford.edu/projects/nmt/ |
| Dataset Splits | Yes | We use news-test 2013 as the validation set and news-test 2015 as the test set. For the DE-EN translation task, we select the dataset from the IWSLT 2014 evaluation campaign (Cettolo et al. 2014), consisting of training/validation/test corpus with approximately 153K, 7K and 6.5K bilingual sentence pairs respectively. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments. |
| Software Dependencies | No | The paper mentions "RNNSearch" and "Adam algorithm (Kingma and Ba 2015)" but does not specify version numbers for any key software components or libraries. |
| Experiment Setup | Yes | For the basic hyper-parameters, we set the word embedding dimension as 620, hidden layer size as 1000, learning rate as 1 10 4, batch size as 80, gradient norm as 1.0 and dropout rate as 0.3 (Srivastava et al. 2014). As implemented in the VAE framework, we set the sampling number L = 1 and the dimension of the latent variable as 2000. During decoding, we adopt the beam search algorithm (Sutskever, Vinyals, and Le 2014) and set the beam size as 10 for all models. For Intro VNMT, we set m = 100, α = 1, β = 1 and γ = 1 as the default parameters respectively. The Inferer E and Decoder D are trained iteratively using the Adam algorithm (Kingma and Ba 2015) (β1 = 0.9, β2 = 0.999). |