reproducibilityindex.ai

The Neural Noisy Channel

Authors: Lei Yu, Phil Blunsom, Chris Dyer, Edward Grefenstette, Tomas Kocisky

ICLR 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results on abstractive sentence summarisation, morphological inﬂection, and machine translation show that noisy channel models outperform direct models, and that they signiﬁcantly beneﬁt from increased amounts of unpaired output data that direct models cannot easily use.
Researcher Affiliation	Collaboration	Lei Yu1 , Phil Blunsom1,2, Chris Dyer2, Edward Grefenstette2, and Tom aˇs Koˇcisk y1,2 1University of Oxford and 2Deep Mind
Pseudocode	Yes	Algorithm 1 Noisy Channel Decoding
Open Source Code	No	The paper does not provide an explicit statement or link to open-source code for the described methodology.
Open Datasets	Yes	The dataset (Rush et al., 2015) that we use is constructed by pairing the ﬁrst sentence and the headline of each article from the annotated Gigaword corpus (Graff et al., 2003; Napoles et al., 2012). [...] We used parallel data with 184k sentence pairs (from the FBIS corpus, LDC2003E14) and monolingual data with 4.3 million of English sentences (selected from the English Gigaword). [...] The dataset (Durrett & De Nero, 2013) that we use in the experiments is created from Wiktionary [...] Our language models were trained on word types extracted by running a morphological analysis tool on the WMT 2016 monolingual data
Dataset Splits	Yes	There are 3.8m, 190k and 381k sentence pairs in the training, validation and test sets, respectively. [...] The train/dev/test split for German nouns is 2364/200/200, and for German verbs is 1617/200/200.
Hardware Specification	No	The paper does not specify the exact hardware (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies	No	The paper mentions optimizers (Adam) and network architectures (LSTMs) but does not provide specific version numbers for software libraries or frameworks used (e.g., Python, TensorFlow, PyTorch versions).
Experiment Setup	Yes	The loss (Equation 2) is optimized by Adam (Kingma & Ba, 2015), with initial learning rate of 0.001. We use LSTMs with 1 layer for both the encoder and decoders, with hidden units of 256. The mini-batch size is 32, and dropout of 0.2 is applied to the input and output of LSTMs. For the language model, we use a 2-layer LSTM with 1024 hidden units and 0.5 dropout. The learning rate is 0.0001. All the hyperparameters are optimised via grid search on the perplexity of the validation set. During decoding, beam search is employed with the number of proposals generated by the direct model K1 = 20, and the number of best candidates selected by the noisy channel model K2 = 10.