reproducibilityindex.ai

Noisin: Unbiased Regularization for Recurrent Neural Networks

Authors: Adji Bousso Dieng, Rajesh Ranganath, Jaan Altosaar, David Blei

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	On language modeling benchmarks, Noisin improves over dropout by as much as 12.2% on the Penn Treebank and 9.4% on the Wikitext-2 dataset. We also compared the state-of-the-art language model of Yang et al. 2017, both with and without Noisin. On the Penn Treebank, the method with Noisin more quickly reaches stateof-the-art performance. Section 5. Empirical Study
Researcher Affiliation	Academia	1Columbia University 2New York University 3Princeton University.
Pseudocode	Yes	Algorithm 1 Noisin with multiplicative noise.
Open Source Code	No	The models were implemented in Py Torch. The source code is available upon request. (Explanation: The code is stated to be 'available upon request', which does not constitute concrete public access.)
Open Datasets	Yes	The Penn Treebank portion of the Wall Street Journal (Marcus et al., 1993) is a long standing benchmark dataset for language modeling. We use the standard split, where sections 0 to 20 (930K tokens) are used for training, sections 21 to 22 (74K tokens) for validation, and sections 23 to 24 (82K tokens) for testing (Mikolov et al., 2010). The Wikitext-2 dataset (Merity et al., 2016) has been recently introduced as an alternative to the Penn Treebank dataset.
Dataset Splits	Yes	We use the standard split, where sections 0 to 20 (930K tokens) are used for training, sections 21 to 22 (74K tokens) for validation, and sections 23 to 24 (82K tokens) for testing (Mikolov et al., 2010).
Hardware Specification	No	We thank the Princeton Institute for Computational Science and Engineering (PICSci E), the Ofﬁce of Information Technology s High Performance Computing Center and Visualization Laboratory at Princeton University for the computational resources. (Explanation: This statement mentions computational resources but does not specify any particular hardware models like GPU/CPU types, processor speeds, or memory configurations.)
Software Dependencies	No	The models were implemented in Py Torch. (Explanation: The paper mentions 'Py Torch' but does not specify a version number or list other software dependencies with their versions.)
Experiment Setup	Yes	We considered two settings in our experiments: a medium-sized network and a large network. The medium-sized network has 2 layers with 650 hidden units each. ... The large network has 2 layers with 1500 hidden units each. ... We train the models using truncated backpropagation through time... for a maximum of 200 epochs. The LSTM was unrolled for 35 steps. We used a batch size of 80 for both datasets. To avoid the problem of exploding gradients we clip the gradients to a maximum norm of 0.25. We used an initial learning rate of 30 for all experiments. This is divided by a factor of 1.2 if the perplexity on the validation set deteriorates. For the dropout-LSTM, the values used for dropout on the input, recurrent, and output layers were 0.5, 0.4, 0.5 respectively.