Variational Recurrent Neural Machine Translation
Authors: Jinsong Su, Shan Wu, Deyi Xiong, Yaojie Lu, Xianpei Han, Biao Zhang
AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on Chinese-English and English-German translation tasks demonstrate that the proposed model achieves significant improvements over both the conventional and variational NMT models. |
| Researcher Affiliation | Academia | Xiamen University, Xiamen, China1 Institute of Software, Chinese Academy of Sciences, Beijing, China2 Soochow University, Suzhou, China3 |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper mentions open-source or re-implemented systems for comparison (Moses, DL4MT tutorial) but does not provide concrete access to its own source code for the methodology described. |
| Open Datasets | Yes | Our Chinese-English training data consists of 1.25M LDC sentence pairs... In English-German translation, our training data consists of 4.46M sentence pairs... We used the NIST MT02 dataset... and the NIST MT03/04/05/06 datasets... We used the news-test 2013 as the validation set and the news-test 2015 as the test set. |
| Dataset Splits | Yes | We used the NIST MT02 dataset as the validation set... We used the news-test 2013 as the validation set |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments, only general training parameters. |
| Software Dependencies | No | The paper mentions using Rmsprop and the Moses script but does not provide specific version numbers for these or any other software libraries or frameworks. |
| Experiment Setup | Yes | We applied Rmsprop (Graves 2013) with iter Num=5, momentum=0, ρ=0.95, and ϵ=1 10 4 to train various NMT models... Specifically, we set word embedding dimension as 620, hidden layer size as 1000, learning rate as 5 10 4, batch size as 80, gradient norm as 1.0, and dropout rate as 0.3. Particularly, we initialized the parameters of VRNMT with the trained conventional NMT model. As implemented in VAE, we set the sampling number L=1, and d e=dz=2df=2000 according to preliminary experiments. During decoding, we used the beam-search algorithm, and set beam sizes of all models as 10. |