Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Deliberation Networks: Sequence Generation Beyond One-Pass Decoding
Authors: Yingce Xia, Fei Tian, Lijun Wu, Jianxin Lin, Tao Qin, Nenghai Yu, Tie-Yan Liu
NeurIPS 2017 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on neural machine translation and text summarization demonstrate the effectiveness of the proposed deliberation networks. On the WMT 2014 English-to-French translation task, our model establishes a new state-of-the-art BLEU score of 41.5. |
| Researcher Affiliation | Collaboration | 1University of Science and Technology of China, Hefei, China 2Microsoft Research, Beijing, China 3Sun Yat-sen University, Guangzhou, China |
| Pseudocode | Yes | Algorithm 1: Algorithm to train the deliberation network |
| Open Source Code | No | The paper does not provide any specific links or explicit statements about releasing the source code for their methodology. |
| Open Datasets | Yes | For En Fr, we employ the standard filtered WMT 14 dataset6... For Zh En, we choose 1.25M bilingual sentence pairs from LDC dataset... The training, validation and test sets for the task are extracted from Gigaword Corpus [6] |
| Dataset Splits | Yes | We concatenate newstest2012 and newstest2013 together as the validation set and use newstest2014 as the test set. For Zh En, we choose 1.25M bilingual sentence pairs from LDC dataset as training corpus, use NIST2003 as the validation set, and NIST2004, NIST2005, NIST2006, NIST2008 as the test sets. |
| Hardware Specification | Yes | All the models are trained on a single NVIDIA K40 GPU. |
| Software Dependencies | No | The paper mentions that the models are "implemented in Theano [24]" but does not specify a version number for Theano or other software dependencies. |
| Experiment Setup | Yes | The word embedding dimension is set as 620. For Zh En, we apply 0.5 dropout rate to the layer before softmax and no dropout is used in En Fr translation. ... Plain SGD is used as the optimizer in this process, with initial learning rate 0.2 and halving according to validation accuracy. To sample the intermediate translation output by the first decoder, we use beam search with beam size 2, considering the tradeoff between accuracy and efficiency. |