Adaptive Normalized Risk-Averting Training for Deep Neural Networks

Authors: Zhiguang Wang, Tim Oates, James Lo

AAAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In practice, we show how this training method is successfully applied for improved training of deep neural networks to solve visual recognition tasks on the MNIST and CIFAR10 datasets. Using simple experimental settings without pretraining and other tricks, we obtain results comparable or superior to those reported in recent literature on the same tasks using standard Conv Nets + MSE/cross entropy.
Researcher Affiliation Academia Zhiguang Wang, Tim Oates Department of Computer Science and Electric Engineering University of Maryland Baltimore County zgwang813@gmail.com, oates@umbc.edu James Lo Department of Mathematics and Statistics University of Maryland Baltimore County jameslo@umbc.edu
Pseudocode No The paper includes mathematical equations for the loss function and gradients but does not provide any structured pseudocode or algorithm blocks.
Open Source Code No The paper mentions implementation in Python and Theano, and adaptation of pylearn2, but does not provide any link or explicit statement about releasing the source code for the methodology described.
Open Datasets Yes In practice, we show how this training method is successfully applied for improved training of deep neural networks to solve visual recognition tasks on the MNIST and CIFAR10 datasets. The MNIST dataset (Le Cun et al. 1998) consists of hand written digits 0-9 which are 28x28 in size. The CIFAR-10 dataset (Krizhevsky and Hinton 2009) is composed of 10 classes of natural images.
Dataset Splits Yes We use 10000 images in training set for validation to select the hyperparameters and report the performance on the test set. We use the last 10,000 images of the training set as validation data for hyperparameter selection and report the test accuracy.
Hardware Specification No The paper mentions "GPU acceleration" but does not specify any particular GPU model, CPU, or other hardware components used for running the experiments.
Software Dependencies No The paper mentions using "Python and Theano" and adapting "pylearn2", but it does not specify any version numbers for these software dependencies.
Experiment Setup Yes The learning rate and penalty weight a are selected in {1, 0.5, 0.1} and {1, 0.1, 0.001} on validation sets respectively. The initial λ is fixed at 10. For the shallow MLPs, we follow the network layout as in (Gui, Lo, and Peng 2014; Le Cun et al. 1998) that has only one hidden layer with 300 neurons. Dropout was applied to all the layers of the network with the probability of retaining a hidden unit being p = (0.9, 0.75, 0.5, 0.5, 0.5) for the different layers of the network.