Loss Surface Simplexes for Mode Connecting Volumes and Fast Ensembling
Authors: Gregory Benton, Wesley Maddox, Sanae Lotfi, Andrew Gordon Gordon Wilson
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In Section 4, we show the existence of mode connecting volumes and provide a lower bound on the dimensionality of these volumes. Building on these insights, in Section 5 we introduce ESPRO, a state-of-the-art approach to ensembling with neural networks, which efficiently averages over simplexes. In Section 6, we show that ESPRO also provides well-calibrated representations of uncertainty. and Figure 8 shows a comparison of test error rate for ensembles of VGG-16 models over different numbers of ensemble components and simplex sizes on CIFAR-10 and CIFAR-100. |
| Researcher Affiliation | Academia | Gregory W. Benton 1 Wesley J. Maddox 1 Sanae Lotfi1 Andrew Gordon Wilson 1 1New York University. Correspondence to: Gregory W. Benton <gwb260@nyu.edu>. |
| Pseudocode | No | The paper describes the SPRO algorithm (e.g., 'The procedure to train these connecting θj forms the core of the SPRO algorithm, given here.') in paragraph text, but no formally labeled 'Pseudocode' or 'Algorithm' block is present. |
| Open Source Code | Yes | Code is available at https://github.com/g-benton/ loss-surface-simplexes. |
| Open Datasets | Yes | Figure 3 shows loss surface visualizations of this simplicial complex in the parameter space of a VGG-16 network trained on CIFAR-10. We show additional results with image transformers (Dosovitskiy et al., 2021) on CIFAR-100 in Appendix B.3, emphasizing that these simplexes are not specific to a particular architecture. |
| Dataset Splits | No | The paper mentions training and test sets (e.g., 'greater than 98% accuracy on the train set', 'Test error as a function of total training budget on CIFAR-10'), and refers to standard datasets like CIFAR-10 and CIFAR-100, but does not explicitly provide specific training/validation/test dataset splits (e.g., percentages, sample counts, citations to predefined splits, or detailed splitting methodology) for reproduction. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for experiments, such as GPU models, CPU models, memory amounts, or detailed computer specifications. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers (e.g., 'Python 3.8, PyTorch 1.9, and CUDA 11.1'). |
| Experiment Setup | Yes | Adding a vertex takes only an additional 10 epochs of training on CIFAR-10, and 20 epochs of training on CIFAR-100. We provide details about how we choose λj in Appendix A.2. The models in Figure 8 are trained on CIFAR-10 for 200 epochs and CIFAR-100 for 300 epochs. |