An empirical analysis of dropout in piecewise linear networks

Authors: David Warde-Farley; Ian J. Goodfellow; Aaron Courville; Yoshua Bengio

ICLR 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this work we empirically investigate several questions related to the efficacy of dropout, specifically as it concerns networks employing the popular rectified linear activation function.
Researcher Affiliation Academia D epartement d informatique et de recherche op erationnelle Universit e de Montr eal Montr eal, QC H3C 3J7 {wardefar,goodfeli}@iro.umontreal.ca, {aaron.courville,yoshua.bengio}@umontreal.ca
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide concrete access to source code for the methodology described in this paper. It mentions using third-party libraries like Theano and pylearn2 but does not state that the authors' own implementation code is available.
Open Datasets Yes we chose four binary sub-tasks from the MNIST handwritten digit database (Le Cun et al., 1998). We also chose two binary sub-tasks from the Cover Type dataset of the UCI Machine Learning Repository
Dataset Splits Yes Our training sets consisted of all occurrences of two digit classes (1 vs. 7, 1 vs. 8, 0 vs. 8, and 2 vs. 3) within the first 50,000 examples of the MNIST training set, with the occurrences from the last 10,000 examples held back as a validation set. An additional 500 points were sampled for a validation set and another 1000 as a test set.
Hardware Specification No The paper mentions 'Compute Canada, and Calcul Qu ebec for providing computational resources' but does not specify any particular hardware details such as GPU/CPU models, memory, or specific machine configurations used for running experiments.
Software Dependencies No The paper mentions using 'Theano' and 'pylearn2' but does not provide specific version numbers for these software dependencies, only citations to the papers introducing them.
Experiment Setup Yes Our initial investigations employed rectifier networks with 2 hidden layers and 10 hidden units per layer, and a single logistic sigmoid output unit. We chose hyperparameters by random search (Bergstra and Bengio, 2012) over learning rate and momentum (initial values and decrease/increase schedules, respectively), as well as mini-batch size. We performed early stopping on the validation set, terminating when a lower validation error had not been observed for 100 epochs.