Variational Recurrent Neural Machine Translation

Authors: Jinsong Su, Shan Wu, Deyi Xiong, Yaojie Lu, Xianpei Han, Biao Zhang

AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on Chinese-English and English-German translation tasks demonstrate that the proposed model achieves significant improvements over both the conventional and variational NMT models.
Researcher Affiliation Academia Xiamen University, Xiamen, China1 Institute of Software, Chinese Academy of Sciences, Beijing, China2 Soochow University, Suzhou, China3
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code No The paper mentions open-source or re-implemented systems for comparison (Moses, DL4MT tutorial) but does not provide concrete access to its own source code for the methodology described.
Open Datasets Yes Our Chinese-English training data consists of 1.25M LDC sentence pairs... In English-German translation, our training data consists of 4.46M sentence pairs... We used the NIST MT02 dataset... and the NIST MT03/04/05/06 datasets... We used the news-test 2013 as the validation set and the news-test 2015 as the test set.
Dataset Splits Yes We used the NIST MT02 dataset as the validation set... We used the news-test 2013 as the validation set
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments, only general training parameters.
Software Dependencies No The paper mentions using Rmsprop and the Moses script but does not provide specific version numbers for these or any other software libraries or frameworks.
Experiment Setup Yes We applied Rmsprop (Graves 2013) with iter Num=5, momentum=0, ρ=0.95, and ϵ=1 10 4 to train various NMT models... Specifically, we set word embedding dimension as 620, hidden layer size as 1000, learning rate as 5 10 4, batch size as 80, gradient norm as 1.0, and dropout rate as 0.3. Particularly, we initialized the parameters of VRNMT with the trained conventional NMT model. As implemented in VAE, we set the sampling number L=1, and d e=dz=2df=2000 according to preliminary experiments. During decoding, we used the beam-search algorithm, and set beam sizes of all models as 10.