Mechanistic Mode Connectivity

Authors: Ekdeep Singh Lubana, Eric J Bigelow, Robert P. Dick, David Krueger, Hidenori Tanaka

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We augment these first steps towards a mechanistic characterization of loss landscapes with extensive empirical verification over a broad variety of settings, including different datasets, architectures, connectivity paths, and training strategies. Extensive experiments on synthetic datasets show CBFT is more effective than recent methods (Kirichenko et al., 2022a;b; Kumar et al., 2022) at reducing a model s tendency to rely on spurious attributes.
Researcher Affiliation Collaboration 1EECS Department, University of Michigan, Ann Arbor, MI, USA 2Center for Brain Science, Harvard University, Cambridge, MA, USA 3Physics & Informatics Laboratories, NTT Research, Inc., Sunnyvale, CA, USA 4University of Cambridge, UK.
Pseudocode No The paper describes the Connectivity-Based Fine-Tuning (CBFT) method using mathematical equations and textual descriptions but does not include a formal pseudocode block or algorithm listing.
Open Source Code Yes Code is available at: https: //github.com/Ekdeep SLubana/MMC.
Open Datasets Yes Synthetic Datasets (right). Following the protocol above, we embed synthetic cues in three existing datasets: (1) CIFAR-10 with 3 3 box cues whose locations depend on the target label; (2) CIFAR-100 with 3 3 box cues colored according to the first digit of the object label, and located according to the second digit; and (3) Dominoes (Shah et al., 2020), where CIFAR-10 images are concatenated with Fashion-MNIST images of the same class.
Dataset Splits Yes We train models using SGD on the synthetic data with cue features (47500 samples), reserving remaining 2,500 training samples as clean data.
Hardware Specification No The paper does not provide any specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments.
Software Dependencies No The paper mentions using 'Py Torch data-loaders' but does not specify the version number of PyTorch or any other software dependencies with their versions.
Experiment Setup Yes When training from scratch (e.g., in Fig. 4), we train models using SGD for 100 epochs with a batch-size of 256, momentum of 0.9, and weight decay of 10 4. Learning rate starts at 0.1 and is dropped by a factor of 10 at the 40th and 80th epochs. ... We run CBFT for 20 epochs, using an initial learning rate of 0.01 with a cosine decay schedule (similar to the baselines). The method turns out to be fairly robust to the exact values of λ1; we fix it to 1 for all experiments without any explicit tuning therefore.