reproducibilityindex.ai

Noisy Activation Functions

Authors: Caglar Gulcehre, Marcin Moczulski, Misha Denil, Yoshua Bengio

ICML 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We ﬁnd experimentally that replacing such saturating activation functions by noisy variants helps optimization in many contexts, yielding state-of-the-art or competitive results on different datasets and task, especially when training seems to be the most difﬁcult, e.g., when curriculum learning is necessary to obtain good results.
Researcher Affiliation	Academia	Caglar Gulcehre GULCEHRC@IRO.UMONTREAL.CA Marcin Moczulski MARCIN.MOCZULSKI@STCATZ.OX.AC.UK Misha Denil MISHA.DENIL@GMAIL.COM Yoshua Bengio BENGIOY@IRO.UMONTREAL.CA University of Montreal University of Oxford
Pseudocode	Yes	Algorithm 1 Noisy Activations with Half-Normal Noise for Hard-Saturating Functions
Open Source Code	Yes	Codes for different types of noisy activation functions can be found at https://github.com/caglar/noisy_units.
Open Datasets	Yes	We trained a 2 layer word-level LSTM language model on Penntreebank. We used the same model proposed by Zaremba et al. (2014).
Dataset Splits	No	The paper mentions "validation perplexity" and uses validation sets, but it does not provide specific details on the split percentages or counts for any of the datasets used to reproduce the data partitioning.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory amounts) used for running its experiments.
Software Dependencies	No	The paper mentions Theano in the acknowledgements but does not provide specific version numbers for it or any other software libraries used in the experiments.
Experiment Setup	Yes	We changed the default gradient clipping to 5 from 10 in order to avoid numerical stability problems. ... In order to anneal the noise, we started training with the scale hyperparameter of the standard deviation of noise with c = 30 and annealed it down to 0.5 with the schedule of c t+1 where t is being incremented at every 200 minibatch updates.