Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning

Authors: Christian Szegedy, Sergey Ioffe, Vincent Vanhoucke, Alexander Alemi

AAAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Here we give clear empirical evidence that training with residual connections accelerates the training of Inception networks significantly. There is also some evidence of residual Inception networks outperforming similarly expensive Inception networks without residual connections by a thin margin. We also present several new streamlined architectures for both residual and nonresidual Inception networks. These variations improve the single-frame recognition performance on the ILSVRC 2012 classification task significantly. We further demonstrate how proper activation scaling stabilizes the training of very wide residual Inception networks. With an ensemble of three residual and one Inception-v4 networks, we achieve 3.08% top-5 error on the test set of the Image Net classification (CLS) challenge.
Researcher Affiliation Industry Christian Szegedy, Sergey Ioffe, Vincent Vanhoucke, Alexander A. Alemi Google Inc. 1600 Amphitheatre Parkway Mountain View, CA
Pseudocode No The paper contains detailed network architecture diagrams but no structured pseudocode or algorithm blocks.
Open Source Code Yes Open source implementations of the Inception-Res Net-v2 and Inception-v4 models in this paper as well as pre-trained weights are available at the Tensor Flow Models github page: github.com/tensorflow/models.
Open Datasets Yes The dataset comprises of 1.2 million training images, 50,000 validation images and 100,000 test images.
Dataset Splits Yes The dataset comprises of 1.2 million training images, 50,000 validation images and 100,000 test images.
Hardware Specification Yes We have trained our networks with stochastic gradient descent, utilizing the Tensor Flow (Abadi et al. 2015) distributed machine learning system using 20 replicas, each running a NVidia Kepler GPU.
Software Dependencies No The paper mentions 'Tensor Flow (Abadi et al. 2015)' and 'Dist Belief (Dean et al. 2012)' but does not specify exact version numbers for these or any other software dependencies.
Experiment Setup Yes We used a learning rate of 0.045, decayed every two epochs using an exponential rate of 0.94. In addition, gradient clipping (Pascanu, Mikolov, and Bengio 2012) was found to be useful to stabilize the training. Our earlier experiments used momentum (Sutskever et al. 2013) with a decay of 0.9, while our best models were achieved using RMSProp (Tieleman and Hinton ) with a decay of 0.9 and ε = 1.0.