reproducibilityindex.ai

Mechanistic Mode Connectivity

Authors: Ekdeep Singh Lubana, Eric J Bigelow, Robert P. Dick, David Krueger, Hidenori Tanaka

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We augment these first steps towards a mechanistic characterization of loss landscapes with extensive empirical verification over a broad variety of settings, including different datasets, architectures, connectivity paths, and training strategies. Extensive experiments on synthetic datasets show CBFT is more effective than recent methods (Kirichenko et al., 2022a;b; Kumar et al., 2022) at reducing a model s tendency to rely on spurious attributes.
Researcher Affiliation	Collaboration	1EECS Department, University of Michigan, Ann Arbor, MI, USA 2Center for Brain Science, Harvard University, Cambridge, MA, USA 3Physics & Informatics Laboratories, NTT Research, Inc., Sunnyvale, CA, USA 4University of Cambridge, UK.
Pseudocode	No	The paper describes the Connectivity-Based Fine-Tuning (CBFT) method using mathematical equations and textual descriptions but does not include a formal pseudocode block or algorithm listing.
Open Source Code	Yes	Code is available at: https: //github.com/Ekdeep SLubana/MMC.
Open Datasets	Yes	Synthetic Datasets (right). Following the protocol above, we embed synthetic cues in three existing datasets: (1) CIFAR-10 with 3 3 box cues whose locations depend on the target label; (2) CIFAR-100 with 3 3 box cues colored according to the first digit of the object label, and located according to the second digit; and (3) Dominoes (Shah et al., 2020), where CIFAR-10 images are concatenated with Fashion-MNIST images of the same class.
Dataset Splits	Yes	We train models using SGD on the synthetic data with cue features (47500 samples), reserving remaining 2,500 training samples as clean data.
Hardware Specification	No	The paper does not provide any specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments.
Software Dependencies	No	The paper mentions using 'Py Torch data-loaders' but does not specify the version number of PyTorch or any other software dependencies with their versions.
Experiment Setup	Yes	When training from scratch (e.g., in Fig. 4), we train models using SGD for 100 epochs with a batch-size of 256, momentum of 0.9, and weight decay of 10 4. Learning rate starts at 0.1 and is dropped by a factor of 10 at the 40th and 80th epochs. ... We run CBFT for 20 epochs, using an initial learning rate of 0.01 with a cosine decay schedule (similar to the baselines). The method turns out to be fairly robust to the exact values of λ1; we fix it to 1 for all experiments without any explicit tuning therefore.