reproducibilityindex.ai

Towards Understanding Sharpness-Aware Minimization

Authors: Maksym Andriushchenko, Nicolas Flammarion

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We further study the properties of the implicit bias on non-linear networks empirically, where we show that ﬁne-tuning a standard model with SAM can lead to signiﬁcant generalization improvements. Finally, we provide convergence results of SAM for non-convex objectives when used with stochastic gradients. We illustrate these results empirically for deep networks and discuss their relation to the generalization behavior of SAM. The code of our experiments is available at https://github.com/ tml-epfl/understanding-sam.
Researcher Affiliation	Academia	1EPFL, Switzerland. Correspondence to: Maksym Andriushchenko <maksym.andriushchenko@epﬂ.ch>.
Pseudocode	No	The paper provides mathematical equations for update rules (e.g., Eq. 4, 7, 8, 9, 10, 11) and describes algorithmic steps in text, but it does not include a distinct pseudocode block or algorithm listing.
Open Source Code	Yes	The code of our experiments is available at https://github.com/ tml-epfl/understanding-sam.
Open Datasets	Yes	We use Res Net-18 on CIFAR-10 and Res Net-34 on CIFAR-100 (Krizhevsky & Hinton, 2009) with standard data augmentation and batch size 128 and refer to App. D for full experimental details, including our implementation of n-SAM.
Dataset Splits	No	The paper mentions training and testing data but does not explicitly provide details about a validation set split (e.g., specific percentages or sample counts for validation).
Hardware Specification	Yes	We perform all our experiments with deep networks on a single NVIDIA V100 GPU with 32GB of memory.
Software Dependencies	No	The paper mentions using SGD, PyTorch (implicitly, given the context of deep learning frameworks), and the seaborn library (with a citation for plotting), but it does not provide specific version numbers for these or other key software dependencies required for reproducibility.
Experiment Setup	Yes	In all experiments, we train deep networks using SGD with step size 0.1, momentum 0.9, and ℓ2-regularization parameter λ = 0.0005. ... For all experiments involving SAM, we select the best perturbation radius ρ based on a grid search over ρ {0.025, 0.05, 0.1, 0.2, 0.3, 0.4}.