reproducibilityindex.ai

Lookbehind-SAM: k steps back, 1 step forward

Authors: Goncalo Mordido, Pranshu Malviya, Aristide Baratin, Sarath Chandar

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	4. Experimental Results In this section, we start by introducing our baselines (Section 4.1), and then we conduct several experiments to showcase the benefits of achieving a better sharpness-loss tradeoff in SAM methods. Particularly, we test the generalization performance on several models and datasets (Section 4.2) and analyze the loss landscapes at the end of training in terms of sharpness (Section 4.3). Then, we study the robustness provided by the different methods in noisy weight settings (Section 4.4). Lastly, we assess continual learning in sequential training settings (Section 4.5).
Researcher Affiliation	Collaboration	1Mila Quebec AI Institute 2Polytechnique Montreal 3Samsung SAIT AI Lab Montreal 4Canada CIFAR AI Chair.
Pseudocode	Yes	The pseudo-code for Lookbehind is in Algorithm 1. ... Algorithm 1 Lookbehind-SAM
Open Source Code	Yes	Our code is available at https://github. com/chandar-lab/Lookbehind-SAM.
Open Datasets	Yes	We use residual networks (Res Nets) (He et al., 2016) and wide residual networks (WRN) (Zagoruyko & Komodakis, 2016) models trained from scratch on CIFAR-10, CIFAR-100 (Krizhevsky et al., 2009), and Image Net (Deng et al., 2009).
Dataset Splits	Yes	Table 1: Generalization performance (validation acc. %) of the different methods on several models and datasets.
Hardware Specification	Yes	We trained the CIFAR-10/100 models using one RTX8000 NVIDIA GPU and 1 CPU core, and the Image Net models using one A100 GPU (with 40 and 80 GB of memory for training from scratch and fine-tuning, respectively) and 6 CPU cores.
Software Dependencies	No	The paper mentions optimizers like 'Adam' and 'SGD' and hints at frameworks like 'PyTorch' (implicitly through common usage) and 'fairseq' without providing specific version numbers for any software dependencies or libraries.
Experiment Setup	Yes	For CIFAR-10/100, we trained each model for 200 epochs with a batch size of 128, starting with a learning rate of 0.1 and dividing it by 10 every 50 epochs. All models were trained using SGD with momentum set to 0.9 and weight decay of 1e-4.