Qualitatively characterizing neural network optimization problems

Authors: Ian Goodfellow and Oriol Vinyals

ICLR 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We introduce a simple analysis technique to look for evidence that such networks are overcoming local optima. We find that, in fact, on a straight path from initialization to solution, a variety of state of the art neural networks never encounter any significant obstacles.In this paper, we present a variety of simple experiments designed to roughly characterize the objective functions involved in neural network training.
Researcher Affiliation Collaboration Google Inc., Mountain View, CA Department of Electrical Engineering, Stanford University, Stanford, CA {goodfellow,vinyals}@google.com, asaxe@stanford.edu
Pseudocode No The paper does not contain any pseudocode or clearly labeled algorithm blocks.
Open Source Code Yes Maxout network: This model was retrained using the publicly available implementation used by Goodfellow et al. (2013c). The code is available at: https://github.com/lisa-lab/pylearn2/blob/master/pylearn2/scripts/ papers/maxout/mnist_pi.yaml
Open Datasets Yes For these experiments we use the MNIST dataset (Le Cun et al., 1998). The linear interpolation experiment for a convolutional maxout network on the CIFAR-10 dataset (Krizhevsky & Hinton, 2009). LSTM regularized with dropout (Hochreiter & Schmidhuber, 1997; Zaremba et al., 2014) on the Penn Treebank dataset (Marcus et al., 1993).
Dataset Splits No The paper mentions the use of a "validation set" in figures (e.g., "J(θ) validation" in Figure 1) and in the text (e.g., "early stopping on a validation set criterion"). However, it does not provide specific details on the split percentages or sizes of the validation set for reproduction.
Hardware Specification No The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory, number of machines) used to run the experiments.
Software Dependencies No The paper acknowledges the developers of "Theano(Bergstra et al., 2010; Bastien et al., 2012) and Pylearn2(Goodfellow et al., 2013b)" but does not specify version numbers for these or any other software dependencies.
Experiment Setup Yes A EXPERIMENT DETAILS All of our experiments except for the sigmoid network were using hyperparameters taken directly from the literature. We fully specify each of them here. Adversarially trained maxout network: This model is the one used by Goodfellow et al. (2014). There is no public configuration for it, but the paper describes how to modify the previous best maxout network to obtain it. Re LU network without dropout: We simply removed the dropout from the preceding configuration file.