Adaptive Normalized Risk-Averting Training for Deep Neural Networks
Authors: Zhiguang Wang, Tim Oates, James Lo
AAAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In practice, we show how this training method is successfully applied for improved training of deep neural networks to solve visual recognition tasks on the MNIST and CIFAR10 datasets. Using simple experimental settings without pretraining and other tricks, we obtain results comparable or superior to those reported in recent literature on the same tasks using standard Conv Nets + MSE/cross entropy. |
| Researcher Affiliation | Academia | Zhiguang Wang, Tim Oates Department of Computer Science and Electric Engineering University of Maryland Baltimore County zgwang813@gmail.com, oates@umbc.edu James Lo Department of Mathematics and Statistics University of Maryland Baltimore County jameslo@umbc.edu |
| Pseudocode | No | The paper includes mathematical equations for the loss function and gradients but does not provide any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper mentions implementation in Python and Theano, and adaptation of pylearn2, but does not provide any link or explicit statement about releasing the source code for the methodology described. |
| Open Datasets | Yes | In practice, we show how this training method is successfully applied for improved training of deep neural networks to solve visual recognition tasks on the MNIST and CIFAR10 datasets. The MNIST dataset (Le Cun et al. 1998) consists of hand written digits 0-9 which are 28x28 in size. The CIFAR-10 dataset (Krizhevsky and Hinton 2009) is composed of 10 classes of natural images. |
| Dataset Splits | Yes | We use 10000 images in training set for validation to select the hyperparameters and report the performance on the test set. We use the last 10,000 images of the training set as validation data for hyperparameter selection and report the test accuracy. |
| Hardware Specification | No | The paper mentions "GPU acceleration" but does not specify any particular GPU model, CPU, or other hardware components used for running the experiments. |
| Software Dependencies | No | The paper mentions using "Python and Theano" and adapting "pylearn2", but it does not specify any version numbers for these software dependencies. |
| Experiment Setup | Yes | The learning rate and penalty weight a are selected in {1, 0.5, 0.1} and {1, 0.1, 0.001} on validation sets respectively. The initial λ is fixed at 10. For the shallow MLPs, we follow the network layout as in (Gui, Lo, and Peng 2014; Le Cun et al. 1998) that has only one hidden layer with 300 neurons. Dropout was applied to all the layers of the network with the probability of retaining a hidden unit being p = (0.9, 0.75, 0.5, 0.5, 0.5) for the different layers of the network. |