What Can Linear Interpolation of Neural Network Loss Landscapes Tell Us?
Authors: Tiffany J Vlaar, Jonathan Frankle
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this paper, we put inferences of this kind to the test, systematically evaluating how linear interpolation and final performance vary when altering the data, choice of initialization, and other optimizer and architecture design choices. |
| Researcher Affiliation | Collaboration | Tiffany Vlaar 1 Jonathan Frankle 2 1Department of Mathematics, University of Edinburgh, Edinburgh, United Kingdom 2Mosaic ML. |
| Pseudocode | No | The paper describes methods in prose and through equations (e.g., Eq. 1, Eq. 2) but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain any explicit statements about the release of its source code or links to a code repository. |
| Open Datasets | Yes | We focus on a Res Net-18 (He et al., 2016) architecture with batch normalization trained for 100 epochs on CIFAR-10 data (Krizhevsky & Hinton, 2009) |
| Dataset Splits | No | The paper discusses training on CIFAR-10 data and evaluates test accuracy but does not explicitly state the dataset splits for training, validation, and testing (e.g., percentages or counts for each split). |
| Hardware Specification | Yes | We perform all our experiments in Py Torch using NVIDIA DGX-1 GPUs and use standard random Py Torch initialization. |
| Software Dependencies | No | The paper mentions 'Py Torch' but does not specify a version number for the software dependency. |
| Experiment Setup | Yes | We focus on a Res Net-18 (He et al., 2016) architecture with batch normalization trained for 100 epochs on CIFAR-10 data (Krizhevsky & Hinton, 2009) using SGD with momentum (0.9) and weight decay (5e-4) using Py Torch (Paszke et al., 2017). We use initial learning rate h = 0.1 that drops by 10x at epochs 33 and 66. For pretrained settings, we use initial learning rate h = 0.001 that drops by 10x after 30 epochs. |