Loss Surfaces, Mode Connectivity, and Fast Ensembling of DNNs
Authors: Timur Garipov, Pavel Izmailov, Dmitrii Podoprikhin, Dmitry P. Vetrov, Andrew G. Wilson
NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Using FGE we can train high-performing ensembles in the time required to train a single model. We achieve improved performance compared to the recent state-of-the-art Snapshot Ensembles, on CIFAR-10, CIFAR-100, and Image Net. |
| Researcher Affiliation | Collaboration | 1Samsung AI Center in Moscow, 2Skolkovo Institute of Science and Technology, 3Cornell University, 4Samsung-HSE Laboratory, National Research University Higher School of Economics, 5National Research University Higher School of Economics |
| Pseudocode | Yes | An outline of the algorithm is provided in the supplement. |
| Open Source Code | Yes | We release the code for reproducing the results in this paper at https://github.com/timgaripov/dnn-mode-connectivity |
| Open Datasets | Yes | We test VGG-16 [19], a 28-layer Wide Res Net with widening factor 10 [22] and a 158-layer Res Net [9] on CIFAR-10, and VGG-16, 164-layer Res Net-bottleneck [9] on CIFAR-100. Image Net ILSVRC-2012 [18] is a large-scale dataset containing 1.2 million training images and 50000 validation images divided into 1000 classes. |
| Dataset Splits | Yes | Image Net ILSVRC-2012 [18] is a large-scale dataset containing 1.2 million training images and 50000 validation images divided into 1000 classes. |
| Hardware Specification | No | The paper does not provide specific hardware details such as exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details (e.g., library or solver names with version numbers like Python 3.8, CPLEX 12.4) needed to replicate the experiment. |
| Experiment Setup | Yes | For FGE, with VGG we use cycle length c = 2 epochs, and a total of 22 models in the final ensemble. With Res Net and Wide Res Net we use c = 4 epochs, and the total number of models in the final ensemble is 12 for Wide Res Nets and 6 for Res Nets. For VGG we set the learning rates to α1 = 10 2, α2 = 5 10 4; for Res Net and Wide Res Net models we set α1 = 5 10 2, α2 = 5 10 4. |