An empirical analysis of dropout in piecewise linear networks
Authors: David Warde-Farley; Ian J. Goodfellow; Aaron Courville; Yoshua Bengio
ICLR 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this work we empirically investigate several questions related to the efficacy of dropout, specifically as it concerns networks employing the popular rectified linear activation function. |
| Researcher Affiliation | Academia | D epartement d informatique et de recherche op erationnelle Universit e de Montr eal Montr eal, QC H3C 3J7 {wardefar,goodfeli}@iro.umontreal.ca, {aaron.courville,yoshua.bengio}@umontreal.ca |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide concrete access to source code for the methodology described in this paper. It mentions using third-party libraries like Theano and pylearn2 but does not state that the authors' own implementation code is available. |
| Open Datasets | Yes | we chose four binary sub-tasks from the MNIST handwritten digit database (Le Cun et al., 1998). We also chose two binary sub-tasks from the Cover Type dataset of the UCI Machine Learning Repository |
| Dataset Splits | Yes | Our training sets consisted of all occurrences of two digit classes (1 vs. 7, 1 vs. 8, 0 vs. 8, and 2 vs. 3) within the first 50,000 examples of the MNIST training set, with the occurrences from the last 10,000 examples held back as a validation set. An additional 500 points were sampled for a validation set and another 1000 as a test set. |
| Hardware Specification | No | The paper mentions 'Compute Canada, and Calcul Qu ebec for providing computational resources' but does not specify any particular hardware details such as GPU/CPU models, memory, or specific machine configurations used for running experiments. |
| Software Dependencies | No | The paper mentions using 'Theano' and 'pylearn2' but does not provide specific version numbers for these software dependencies, only citations to the papers introducing them. |
| Experiment Setup | Yes | Our initial investigations employed rectifier networks with 2 hidden layers and 10 hidden units per layer, and a single logistic sigmoid output unit. We chose hyperparameters by random search (Bergstra and Bengio, 2012) over learning rate and momentum (initial values and decrease/increase schedules, respectively), as well as mini-batch size. We performed early stopping on the validation set, terminating when a lower validation error had not been observed for 100 epochs. |