Towards Understanding Sharpness-Aware Minimization

Authors: Maksym Andriushchenko, Nicolas Flammarion

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We further study the properties of the implicit bias on non-linear networks empirically, where we show that fine-tuning a standard model with SAM can lead to significant generalization improvements. Finally, we provide convergence results of SAM for non-convex objectives when used with stochastic gradients. We illustrate these results empirically for deep networks and discuss their relation to the generalization behavior of SAM. The code of our experiments is available at https://github.com/ tml-epfl/understanding-sam.
Researcher Affiliation Academia 1EPFL, Switzerland. Correspondence to: Maksym Andriushchenko <maksym.andriushchenko@epfl.ch>.
Pseudocode No The paper provides mathematical equations for update rules (e.g., Eq. 4, 7, 8, 9, 10, 11) and describes algorithmic steps in text, but it does not include a distinct pseudocode block or algorithm listing.
Open Source Code Yes The code of our experiments is available at https://github.com/ tml-epfl/understanding-sam.
Open Datasets Yes We use Res Net-18 on CIFAR-10 and Res Net-34 on CIFAR-100 (Krizhevsky & Hinton, 2009) with standard data augmentation and batch size 128 and refer to App. D for full experimental details, including our implementation of n-SAM.
Dataset Splits No The paper mentions training and testing data but does not explicitly provide details about a validation set split (e.g., specific percentages or sample counts for validation).
Hardware Specification Yes We perform all our experiments with deep networks on a single NVIDIA V100 GPU with 32GB of memory.
Software Dependencies No The paper mentions using SGD, PyTorch (implicitly, given the context of deep learning frameworks), and the seaborn library (with a citation for plotting), but it does not provide specific version numbers for these or other key software dependencies required for reproducibility.
Experiment Setup Yes In all experiments, we train deep networks using SGD with step size 0.1, momentum 0.9, and ℓ2-regularization parameter λ = 0.0005. ... For all experiments involving SAM, we select the best perturbation radius ρ based on a grid search over ρ {0.025, 0.05, 0.1, 0.2, 0.3, 0.4}.