Deliberation Networks: Sequence Generation Beyond One-Pass Decoding

Authors: Yingce Xia, Fei Tian, Lijun Wu, Jianxin Lin, Tao Qin, Nenghai Yu, Tie-Yan Liu

NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on neural machine translation and text summarization demonstrate the effectiveness of the proposed deliberation networks. On the WMT 2014 English-to-French translation task, our model establishes a new state-of-the-art BLEU score of 41.5.
Researcher Affiliation Collaboration 1University of Science and Technology of China, Hefei, China 2Microsoft Research, Beijing, China 3Sun Yat-sen University, Guangzhou, China
Pseudocode Yes Algorithm 1: Algorithm to train the deliberation network
Open Source Code No The paper does not provide any specific links or explicit statements about releasing the source code for their methodology.
Open Datasets Yes For En Fr, we employ the standard filtered WMT 14 dataset6... For Zh En, we choose 1.25M bilingual sentence pairs from LDC dataset... The training, validation and test sets for the task are extracted from Gigaword Corpus [6]
Dataset Splits Yes We concatenate newstest2012 and newstest2013 together as the validation set and use newstest2014 as the test set. For Zh En, we choose 1.25M bilingual sentence pairs from LDC dataset as training corpus, use NIST2003 as the validation set, and NIST2004, NIST2005, NIST2006, NIST2008 as the test sets.
Hardware Specification Yes All the models are trained on a single NVIDIA K40 GPU.
Software Dependencies No The paper mentions that the models are "implemented in Theano [24]" but does not specify a version number for Theano or other software dependencies.
Experiment Setup Yes The word embedding dimension is set as 620. For Zh En, we apply 0.5 dropout rate to the layer before softmax and no dropout is used in En Fr translation. ... Plain SGD is used as the optimizer in this process, with initial learning rate 0.2 and halving according to validation accuracy. To sample the intermediate translation output by the first decoder, we use beam search with beam size 2, considering the tradeoff between accuracy and efficiency.