reproducibilityindex.ai

SAMPa: Sharpness-aware Minimization Parallelized

Authors: Wanyun Xie, Thomas Pethick, Volkan Cevher

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical results show that SAMPa ranks among the most efficient variants of SAM in terms of computational time. Additionally, our method consistently outperforms SAM across both vision and language tasks.
Researcher Affiliation	Academia	Wanyun Xie EPFL (LIONS) wanyun.xie@epfl.ch Thomas Pethick EPFL (LIONS) thomas.pethick@epfl.ch Volkan Cevher EPFL (LIONS) volkan.cevher@epfl.ch
Pseudocode	Yes	Algorithm 1 SAM Parallelized (SAMPa)
Open Source Code	Yes	Our code is available at https://github.com/LIONS-EPFL/SAMPa.
Open Datasets	Yes	We use the CIFAR-10 and CIFAR-100 datasets [Krizhevsky et al., 2009], both consisting of 50 000 training images of size 32 32, with 10 and 100 classes, respectively. [...] We evaluate SAM and SAMPa-0.2 on Image Net-1K [Russakovsky et al., 2015] [...] In particular, we use BERT-base model and finetune it on the GLUE datasets [Wang et al., 2018].
Dataset Splits	Yes	Training data is randomly partitioned into 90% for training and 10% for validation.
Hardware Specification	Yes	All experiments are conducted on NVIDIA A100 GPU.
Software Dependencies	No	The paper does not provide specific version numbers for software dependencies or libraries used in the experiments.
Experiment Setup	Yes	The models are trained using stochastic gradient descent (SGD) with a momentum of 0.9 and a weight decay of 5 10 4, both as a baseline and as the base model for SAM variants. We used a batch size of 128 and a cosine learning rate schedule that starts at 0.1. The number of epochs is set to 200 for SAM and SAMPa while SGD are given 400 epochs. [...] Label smoothing with a factor of 0.1 is employed for all methods. [...] SAM is assigned ρ values of 0.05 and 0.1 on CIFAR-10 and CIFAR-100 respectively, which is consistent with existing works [Foret et al., 2021, Kwon et al., 2021]. Moreover, SAMPa-0 shares the same ρ value as SAM while SAMPa-0.2 is configured with twice the value of SAM s ρ. Additionally, λ for SAMPa-λ is set at 0.2 through a grid search from 0 to 1, with intervals of 0.1...