SAMPa: Sharpness-aware Minimization Parallelized
Authors: Wanyun Xie, Thomas Pethick, Volkan Cevher
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical results show that SAMPa ranks among the most efficient variants of SAM in terms of computational time. Additionally, our method consistently outperforms SAM across both vision and language tasks. |
| Researcher Affiliation | Academia | Wanyun Xie EPFL (LIONS) wanyun.xie@epfl.ch Thomas Pethick EPFL (LIONS) thomas.pethick@epfl.ch Volkan Cevher EPFL (LIONS) volkan.cevher@epfl.ch |
| Pseudocode | Yes | Algorithm 1 SAM Parallelized (SAMPa) |
| Open Source Code | Yes | Our code is available at https://github.com/LIONS-EPFL/SAMPa. |
| Open Datasets | Yes | We use the CIFAR-10 and CIFAR-100 datasets [Krizhevsky et al., 2009], both consisting of 50 000 training images of size 32 32, with 10 and 100 classes, respectively. [...] We evaluate SAM and SAMPa-0.2 on Image Net-1K [Russakovsky et al., 2015] [...] In particular, we use BERT-base model and finetune it on the GLUE datasets [Wang et al., 2018]. |
| Dataset Splits | Yes | Training data is randomly partitioned into 90% for training and 10% for validation. |
| Hardware Specification | Yes | All experiments are conducted on NVIDIA A100 GPU. |
| Software Dependencies | No | The paper does not provide specific version numbers for software dependencies or libraries used in the experiments. |
| Experiment Setup | Yes | The models are trained using stochastic gradient descent (SGD) with a momentum of 0.9 and a weight decay of 5 10 4, both as a baseline and as the base model for SAM variants. We used a batch size of 128 and a cosine learning rate schedule that starts at 0.1. The number of epochs is set to 200 for SAM and SAMPa while SGD are given 400 epochs. [...] Label smoothing with a factor of 0.1 is employed for all methods. [...] SAM is assigned ρ values of 0.05 and 0.1 on CIFAR-10 and CIFAR-100 respectively, which is consistent with existing works [Foret et al., 2021, Kwon et al., 2021]. Moreover, SAMPa-0 shares the same ρ value as SAM while SAMPa-0.2 is configured with twice the value of SAM s ρ. Additionally, λ for SAMPa-λ is set at 0.2 through a grid search from 0 to 1, with intervals of 0.1... |