Linear Mode Connectivity and the Lottery Ticket Hypothesis
Authors: Jonathan Frankle, Gintare Karolina Dziugaite, Daniel Roy, Michael Carbin
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We study whether a neural network optimizes to the same, linearly connected minimum under different samples of SGD noise (e.g., random data order and augmentation). We find that standard vision models become stable to SGD noise in this way early in training. From then on, the outcome of optimization is determined to a linearly connected region. We use this technique to study iterative magnitude pruning (IMP), the procedure used by work on the lottery ticket hypothesis to identify subnetworks that could have trained in isolation to full accuracy. We find that these subnetworks only reach full accuracy when they are stable to SGD noise, which either occurs at initialization for small-scale settings (MNIST) or early in training for large-scale settings (Res Net-50 and Inception-v3 on Image Net). |
| Researcher Affiliation | Collaboration | 1MIT CSAIL 2Element AI 3University of Toronto 4Vector Institute. |
| Pseudocode | Yes | Algorithm 1 Compute instability of Wk with function f. |
| Open Source Code | No | The paper does not provide an explicit statement of open-source code availability for its methodology, nor does it include a link to a code repository. |
| Open Datasets | Yes | We study image classification networks on MNIST, CIFAR-10, and Image Net as listed in Table 1. |
| Dataset Splits | No | The paper mentions using train and test sets for evaluation (e.g., 'test set instability'), but it does not provide explicit details about the dataset splits (e.g., specific percentages or sample counts for training, validation, and testing). |
| Hardware Specification | No | The paper mentions 'GPU resources' and 'TPU resources' from IBM and Google respectively, but does not provide specific hardware models (e.g., GPU/CPU models, memory details) for these resources. |
| Software Dependencies | No | The paper mentions using 'Tensor Flow Research Cloud' but does not specify any software names with version numbers for libraries, frameworks, or operating systems used in the experiments. |
| Experiment Setup | Yes | Table 1. Our networks and hyperparameters. Accuracies are the means and standard deviations across three initializations. Hyperparameters for Res Net-20 standard are from He et al. (2016). Hyperparameters for VGG-16 standard are from Liu et al. (2019). Hyperparameters for low, warmup, and Le Net are adapted from Frankle & Carbin (2019). Hyperparameters for Image Net networks are from Google s reference TPU code (Google, 2018). |