reproducibilityindex.ai

An empirical analysis of dropout in piecewise linear networks

Authors: David Warde-Farley; Ian J. Goodfellow; Aaron Courville; Yoshua Bengio

ICLR 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this work we empirically investigate several questions related to the efﬁcacy of dropout, speciﬁcally as it concerns networks employing the popular rectiﬁed linear activation function.
Researcher Affiliation	Academia	D epartement d informatique et de recherche op erationnelle Universit e de Montr eal Montr eal, QC H3C 3J7 {wardefar,goodfeli}@iro.umontreal.ca, {aaron.courville,yoshua.bengio}@umontreal.ca
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide concrete access to source code for the methodology described in this paper. It mentions using third-party libraries like Theano and pylearn2 but does not state that the authors' own implementation code is available.
Open Datasets	Yes	we chose four binary sub-tasks from the MNIST handwritten digit database (Le Cun et al., 1998). We also chose two binary sub-tasks from the Cover Type dataset of the UCI Machine Learning Repository
Dataset Splits	Yes	Our training sets consisted of all occurrences of two digit classes (1 vs. 7, 1 vs. 8, 0 vs. 8, and 2 vs. 3) within the ﬁrst 50,000 examples of the MNIST training set, with the occurrences from the last 10,000 examples held back as a validation set. An additional 500 points were sampled for a validation set and another 1000 as a test set.
Hardware Specification	No	The paper mentions 'Compute Canada, and Calcul Qu ebec for providing computational resources' but does not specify any particular hardware details such as GPU/CPU models, memory, or specific machine configurations used for running experiments.
Software Dependencies	No	The paper mentions using 'Theano' and 'pylearn2' but does not provide specific version numbers for these software dependencies, only citations to the papers introducing them.
Experiment Setup	Yes	Our initial investigations employed rectiﬁer networks with 2 hidden layers and 10 hidden units per layer, and a single logistic sigmoid output unit. We chose hyperparameters by random search (Bergstra and Bengio, 2012) over learning rate and momentum (initial values and decrease/increase schedules, respectively), as well as mini-batch size. We performed early stopping on the validation set, terminating when a lower validation error had not been observed for 100 epochs.