Learning Deep ResNet Blocks Sequentially using Boosting Theory
Authors: Furong Huang, Jordan Ash, John Langford, Robert Schapire
ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimentally, we compare Boost Res Net with e2e BP over two types of feed-forward Res Nets, multilayer perceptron residual network (MLP-Res Net) and convolutional neural network residual network (CNN-Res Net), on multiple datasets. Boost Res Net shows substantial computational performance improvements and accuracy improvement under the MLP-Res Net architecture. Under CNN-Res Net, a faster convergence for Boost Res Net is observed. |
| Researcher Affiliation | Collaboration | Furong Huang 1 Jordan T. Ash 2 John Langford 3 Robert E. Schapire 3 1Department of Computer Science, University of Maryland; 2Department of Computer Science, Princeton University; 3Microsoft Research. |
| Pseudocode | Yes | Algorithm 1 Boost Res Net: telescoping sum boosting for binary-class classification; Algorithm 2 Boost Res Net: oracle implementation for training a Res Net block |
| Open Source Code | No | The paper states that experiments were programmed in 'Torch deep learning framework for Lua' but does not provide a link or explicit statement about making the source code for their proposed method publicly available. |
| Open Datasets | Yes | We compare our proposed Boost Res Net algorithm with e2e BP training a Res Net on the MNIST (Le Cun et al., 1998), street view house numbers (SVHN) (Netzer et al., 2011), and CIFAR-10 (Krizhevsky & Hinton, 2009) benchmark datasets. |
| Dataset Splits | Yes | Hyperparameters are selected via random search for highest accuracy on a validation set. |
| Hardware Specification | Yes | Our experiments are programmed in the Torch deep learning framework for Lua and executed on NVIDIA Tesla P100 GPUs. |
| Software Dependencies | No | The paper mentions using 'Torch deep learning framework for Lua' and that 'All models are trained using the Adam variant of SGD', but it does not specify version numbers for these software components or libraries. |
| Experiment Setup | Yes | To list the hyperparameters we use in our Boost Res Net training after searching over candidate hyperparamters, we optimize learning rate to be 0.004 with a 9 10 5 learning rate decay. The gamma threshold is optimized to be 0.001 and the initial gamma value on SVHN is 0.75. On CIFAR-10 dataset, the main advantage of Boost Res Net over e2e BP is the speed of training. Boost Res Net refined with e2e BP obtains comparable results with e2e BP. This is because we are using a suboptimal architecture of Res Net which overfits the CIFAR-10 dataset. Ada Boost, on the other hand, is known to be resistant to overfitting. In Boost Res Net training, we optimize learning rate to be 0.014 with a 3.46 10 5 learning rate decay. The gamma threshold is optimized to be 0.007 and the initial gamma value on CIFAR-10 is 0.93. |