Loss Surfaces, Mode Connectivity, and Fast Ensembling of DNNs

Authors: Timur Garipov, Pavel Izmailov, Dmitrii Podoprikhin, Dmitry P. Vetrov, Andrew G. Wilson

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Using FGE we can train high-performing ensembles in the time required to train a single model. We achieve improved performance compared to the recent state-of-the-art Snapshot Ensembles, on CIFAR-10, CIFAR-100, and Image Net.
Researcher Affiliation Collaboration 1Samsung AI Center in Moscow, 2Skolkovo Institute of Science and Technology, 3Cornell University, 4Samsung-HSE Laboratory, National Research University Higher School of Economics, 5National Research University Higher School of Economics
Pseudocode Yes An outline of the algorithm is provided in the supplement.
Open Source Code Yes We release the code for reproducing the results in this paper at https://github.com/timgaripov/dnn-mode-connectivity
Open Datasets Yes We test VGG-16 [19], a 28-layer Wide Res Net with widening factor 10 [22] and a 158-layer Res Net [9] on CIFAR-10, and VGG-16, 164-layer Res Net-bottleneck [9] on CIFAR-100. Image Net ILSVRC-2012 [18] is a large-scale dataset containing 1.2 million training images and 50000 validation images divided into 1000 classes.
Dataset Splits Yes Image Net ILSVRC-2012 [18] is a large-scale dataset containing 1.2 million training images and 50000 validation images divided into 1000 classes.
Hardware Specification No The paper does not provide specific hardware details such as exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications used for running its experiments.
Software Dependencies No The paper does not provide specific ancillary software details (e.g., library or solver names with version numbers like Python 3.8, CPLEX 12.4) needed to replicate the experiment.
Experiment Setup Yes For FGE, with VGG we use cycle length c = 2 epochs, and a total of 22 models in the final ensemble. With Res Net and Wide Res Net we use c = 4 epochs, and the total number of models in the final ensemble is 12 for Wide Res Nets and 6 for Res Nets. For VGG we set the learning rates to α1 = 10 2, α2 = 5 10 4; for Res Net and Wide Res Net models we set α1 = 5 10 2, α2 = 5 10 4.