Learning Deep ResNet Blocks Sequentially using Boosting Theory

Authors: Furong Huang, Jordan Ash, John Langford, Robert Schapire

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimentally, we compare Boost Res Net with e2e BP over two types of feed-forward Res Nets, multilayer perceptron residual network (MLP-Res Net) and convolutional neural network residual network (CNN-Res Net), on multiple datasets. Boost Res Net shows substantial computational performance improvements and accuracy improvement under the MLP-Res Net architecture. Under CNN-Res Net, a faster convergence for Boost Res Net is observed.
Researcher Affiliation Collaboration Furong Huang 1 Jordan T. Ash 2 John Langford 3 Robert E. Schapire 3 1Department of Computer Science, University of Maryland; 2Department of Computer Science, Princeton University; 3Microsoft Research.
Pseudocode Yes Algorithm 1 Boost Res Net: telescoping sum boosting for binary-class classification; Algorithm 2 Boost Res Net: oracle implementation for training a Res Net block
Open Source Code No The paper states that experiments were programmed in 'Torch deep learning framework for Lua' but does not provide a link or explicit statement about making the source code for their proposed method publicly available.
Open Datasets Yes We compare our proposed Boost Res Net algorithm with e2e BP training a Res Net on the MNIST (Le Cun et al., 1998), street view house numbers (SVHN) (Netzer et al., 2011), and CIFAR-10 (Krizhevsky & Hinton, 2009) benchmark datasets.
Dataset Splits Yes Hyperparameters are selected via random search for highest accuracy on a validation set.
Hardware Specification Yes Our experiments are programmed in the Torch deep learning framework for Lua and executed on NVIDIA Tesla P100 GPUs.
Software Dependencies No The paper mentions using 'Torch deep learning framework for Lua' and that 'All models are trained using the Adam variant of SGD', but it does not specify version numbers for these software components or libraries.
Experiment Setup Yes To list the hyperparameters we use in our Boost Res Net training after searching over candidate hyperparamters, we optimize learning rate to be 0.004 with a 9 10 5 learning rate decay. The gamma threshold is optimized to be 0.001 and the initial gamma value on SVHN is 0.75. On CIFAR-10 dataset, the main advantage of Boost Res Net over e2e BP is the speed of training. Boost Res Net refined with e2e BP obtains comparable results with e2e BP. This is because we are using a suboptimal architecture of Res Net which overfits the CIFAR-10 dataset. Ada Boost, on the other hand, is known to be resistant to overfitting. In Boost Res Net training, we optimize learning rate to be 0.014 with a 3.46 10 5 learning rate decay. The gamma threshold is optimized to be 0.007 and the initial gamma value on CIFAR-10 is 0.93.