Mechanistic Mode Connectivity
Authors: Ekdeep Singh Lubana, Eric J Bigelow, Robert P. Dick, David Krueger, Hidenori Tanaka
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We augment these first steps towards a mechanistic characterization of loss landscapes with extensive empirical verification over a broad variety of settings, including different datasets, architectures, connectivity paths, and training strategies. Extensive experiments on synthetic datasets show CBFT is more effective than recent methods (Kirichenko et al., 2022a;b; Kumar et al., 2022) at reducing a model s tendency to rely on spurious attributes. |
| Researcher Affiliation | Collaboration | 1EECS Department, University of Michigan, Ann Arbor, MI, USA 2Center for Brain Science, Harvard University, Cambridge, MA, USA 3Physics & Informatics Laboratories, NTT Research, Inc., Sunnyvale, CA, USA 4University of Cambridge, UK. |
| Pseudocode | No | The paper describes the Connectivity-Based Fine-Tuning (CBFT) method using mathematical equations and textual descriptions but does not include a formal pseudocode block or algorithm listing. |
| Open Source Code | Yes | Code is available at: https: //github.com/Ekdeep SLubana/MMC. |
| Open Datasets | Yes | Synthetic Datasets (right). Following the protocol above, we embed synthetic cues in three existing datasets: (1) CIFAR-10 with 3 3 box cues whose locations depend on the target label; (2) CIFAR-100 with 3 3 box cues colored according to the first digit of the object label, and located according to the second digit; and (3) Dominoes (Shah et al., 2020), where CIFAR-10 images are concatenated with Fashion-MNIST images of the same class. |
| Dataset Splits | Yes | We train models using SGD on the synthetic data with cue features (47500 samples), reserving remaining 2,500 training samples as clean data. |
| Hardware Specification | No | The paper does not provide any specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments. |
| Software Dependencies | No | The paper mentions using 'Py Torch data-loaders' but does not specify the version number of PyTorch or any other software dependencies with their versions. |
| Experiment Setup | Yes | When training from scratch (e.g., in Fig. 4), we train models using SGD for 100 epochs with a batch-size of 256, momentum of 0.9, and weight decay of 10 4. Learning rate starts at 0.1 and is dropped by a factor of 10 at the 40th and 80th epochs. ... We run CBFT for 20 epochs, using an initial learning rate of 0.01 with a cosine decay schedule (similar to the baselines). The method turns out to be fairly robust to the exact values of λ1; we fix it to 1 for all experiments without any explicit tuning therefore. |