Towards Understanding Sharpness-Aware Minimization
Authors: Maksym Andriushchenko, Nicolas Flammarion
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We further study the properties of the implicit bias on non-linear networks empirically, where we show that fine-tuning a standard model with SAM can lead to significant generalization improvements. Finally, we provide convergence results of SAM for non-convex objectives when used with stochastic gradients. We illustrate these results empirically for deep networks and discuss their relation to the generalization behavior of SAM. The code of our experiments is available at https://github.com/ tml-epfl/understanding-sam. |
| Researcher Affiliation | Academia | 1EPFL, Switzerland. Correspondence to: Maksym Andriushchenko <maksym.andriushchenko@epfl.ch>. |
| Pseudocode | No | The paper provides mathematical equations for update rules (e.g., Eq. 4, 7, 8, 9, 10, 11) and describes algorithmic steps in text, but it does not include a distinct pseudocode block or algorithm listing. |
| Open Source Code | Yes | The code of our experiments is available at https://github.com/ tml-epfl/understanding-sam. |
| Open Datasets | Yes | We use Res Net-18 on CIFAR-10 and Res Net-34 on CIFAR-100 (Krizhevsky & Hinton, 2009) with standard data augmentation and batch size 128 and refer to App. D for full experimental details, including our implementation of n-SAM. |
| Dataset Splits | No | The paper mentions training and testing data but does not explicitly provide details about a validation set split (e.g., specific percentages or sample counts for validation). |
| Hardware Specification | Yes | We perform all our experiments with deep networks on a single NVIDIA V100 GPU with 32GB of memory. |
| Software Dependencies | No | The paper mentions using SGD, PyTorch (implicitly, given the context of deep learning frameworks), and the seaborn library (with a citation for plotting), but it does not provide specific version numbers for these or other key software dependencies required for reproducibility. |
| Experiment Setup | Yes | In all experiments, we train deep networks using SGD with step size 0.1, momentum 0.9, and ℓ2-regularization parameter λ = 0.0005. ... For all experiments involving SAM, we select the best perturbation radius ρ based on a grid search over ρ {0.025, 0.05, 0.1, 0.2, 0.3, 0.4}. |