Decoding with Value Networks for Neural Machine Translation

Authors: Di He, Hanqing Lu, Yingce Xia, Tao Qin, Liwei Wang, Tie-Yan Liu

NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct a set of experiments on several translation tasks. All the results demonstrate the effectiveness and robustness of the new decoding mechanism compared to several baseline algorithms.
Researcher Affiliation Collaboration 1Key Laboratory of Machine Perception, MOE, School of EECS, Peking University 2Carnegie Mellon University 3University of Science and Technology of China 4Microsoft Research 5Center for Data Science, Peking University, Beijing Institute of Big Data Research
Pseudocode Yes The details of the decoding process are presented in Algorithm 2, and we call our neural network-based decoding algorithm NMT-VNN for short.
Open Source Code No The paper states that 'For NMT-BS, we directly used the open source code [2]', referring to a baseline's code, but does not provide or explicitly state that their own code for NMT-VNN is open-source or publicly available.
Open Datasets Yes In detail, we used the same bilingual corpora from WMT 14 as used in [2] , which contains 12M, 4.5M and 10M training data for each task.
Dataset Splits Yes Following common practices, for En Fr and En De, we concatenated newstest2012 and newstest2013 as the validation set, and used newstest2014 as the testing set. For Zh En, we used NIST 2006 and NIST 2008 datasets for testing, and use NIST 2004 dataset for validation.
Hardware Specification Yes The NMT model was trained with asynchronized SGD on four K40m GPUs for about seven days. ...the value network model was trained with Ada Delta [21] on one K40m GPU for about three days.
Software Dependencies No The paper mentions 'Ada Delta [21]' as an optimization algorithm and 'multi-bleu.perl script' for evaluation, but does not specify version numbers for general software dependencies like programming languages or deep learning frameworks (e.g., Python 3.x, TensorFlow 2.x).
Experiment Setup Yes For each language, we constructed the vocabulary with the most common 30K words in the parallel corpora, and out-of-vocabulary words were replaced with a special token UNK". Each word was embedded into a vector space of 620 dimensions, and the dimension of the recurrent unit was 1000. We removed sentences with more than 50 words from the training set. Batch size was set as 80 with 20 batches pre-fetched and sorted by sentence lengths.