SAMPa: Sharpness-aware Minimization Parallelized

Authors: Wanyun Xie, Thomas Pethick, Volkan Cevher

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical results show that SAMPa ranks among the most efficient variants of SAM in terms of computational time. Additionally, our method consistently outperforms SAM across both vision and language tasks.
Researcher Affiliation Academia Wanyun Xie EPFL (LIONS) wanyun.xie@epfl.ch Thomas Pethick EPFL (LIONS) thomas.pethick@epfl.ch Volkan Cevher EPFL (LIONS) volkan.cevher@epfl.ch
Pseudocode Yes Algorithm 1 SAM Parallelized (SAMPa)
Open Source Code Yes Our code is available at https://github.com/LIONS-EPFL/SAMPa.
Open Datasets Yes We use the CIFAR-10 and CIFAR-100 datasets [Krizhevsky et al., 2009], both consisting of 50 000 training images of size 32 32, with 10 and 100 classes, respectively. [...] We evaluate SAM and SAMPa-0.2 on Image Net-1K [Russakovsky et al., 2015] [...] In particular, we use BERT-base model and finetune it on the GLUE datasets [Wang et al., 2018].
Dataset Splits Yes Training data is randomly partitioned into 90% for training and 10% for validation.
Hardware Specification Yes All experiments are conducted on NVIDIA A100 GPU.
Software Dependencies No The paper does not provide specific version numbers for software dependencies or libraries used in the experiments.
Experiment Setup Yes The models are trained using stochastic gradient descent (SGD) with a momentum of 0.9 and a weight decay of 5 10 4, both as a baseline and as the base model for SAM variants. We used a batch size of 128 and a cosine learning rate schedule that starts at 0.1. The number of epochs is set to 200 for SAM and SAMPa while SGD are given 400 epochs. [...] Label smoothing with a factor of 0.1 is employed for all methods. [...] SAM is assigned ρ values of 0.05 and 0.1 on CIFAR-10 and CIFAR-100 respectively, which is consistent with existing works [Foret et al., 2021, Kwon et al., 2021]. Moreover, SAMPa-0 shares the same ρ value as SAM while SAMPa-0.2 is configured with twice the value of SAM s ρ. Additionally, λ for SAMPa-λ is set at 0.2 through a grid search from 0 to 1, with intervals of 0.1...