reproducibilityindex.ai

Sharpness-aware Minimization for Efficiently Improving Generalization

Authors: Pierre Foret, Ariel Kleiner, Hossein Mobahi, Behnam Neyshabur

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We present empirical results showing that SAM improves model generalization across a variety of benchmark datasets (e.g., CIFAR-{10, 100}, Image Net, ﬁnetuning tasks) and models, yielding novel state-of-the-art performance for several. Additionally, we ﬁnd that SAM natively provides robustness to label noise on par with that provided by state-of-the-art procedures that speciﬁcally target learning with noisy labels. 3 EMPIRICAL EVALUATION In order to assess SAM s efﬁcacy, we apply it to a range of different tasks, including image classiﬁcation from scratch (including on CIFAR-10, CIFAR-100, and Image Net), ﬁnetuning pretrained models, and learning with noisy labels.
Researcher Affiliation	Industry	Pierre Foret Google Research pierreforet@google.com Ariel Kleiner Google Research akleiner@google.com Hossein Mobahi Google Research hmobahi@google.com Behnam Neyshabur Blueshift, Alphabet neyshabur@google.com
Pseudocode	Yes	Algorithm 1 gives pseudo-code for the full SAM algorithm, using SGD as the base optimizer, and Figure 2 schematically illustrates a single SAM parameter update. Algorithm 1: SAM algorithm
Open Source Code	Yes	We open source our code at https: //github.com/google-research/sam.
Open Datasets	Yes	We present empirical results showing that SAM improves model generalization across a variety of benchmark datasets (e.g., CIFAR-{10, 100}, Image Net, ﬁnetuning tasks) and models, yielding novel state-of-the-art performance for several. Beyond CIFAR-{10, 100}, we have also evaluated SAM on the SVHN (Netzer et al., 2011) and Fashion-MNIST datasets (Xiao et al., 2017).
Dataset Splits	Yes	SAM has a single hyperparameter ρ (the neighborhood size), which we tune via a grid search over {0.01, 0.02, 0.05, 0.1, 0.2, 0.5} using 10% of the training set as a validation set. We report the validation accuracy of the bootstrapped version of SAM for different levels of noise and different ρ in table 8.
Hardware Specification	Yes	Our implementations utilize JAX (Bradbury et al., 2018), and we train all models on a single host having 8 Nvidia V100 GPUs. We train all models on Image Net for up to 400 epochs using a Google Cloud TPUv3 and report top-1 and top-5 test error rates for each experimental condition (mean and 95% conﬁdence interval across 5 independent runs).
Software Dependencies	No	This approximation to w LSAM S (w) can be straightforwardly computed via automatic differentiation, as implemented in common libraries such as JAX, Tensor Flow, and Py Torch. While these software components are mentioned, specific version numbers are not provided.
Experiment Setup	Yes	All results use basic data augmentations (horizontal ﬂip, padding by four pixels, and random crop). We also evaluate in the setting of more advanced data augmentation methods such as cutout regularization (Devries & Taylor, 2017) and Auto Augment (Cubuk et al., 2018)... All results use basic data augmentations (horizontal ﬂip, padding by four pixels, and random crop). We train all models on Image Net for up to 400 epochs using a Google Cloud TPUv3 and report top-1 and top-5 test error rates for each experimental condition (mean and 95% conﬁdence interval across 5 independent runs). Table 6: Hyper-parameter used to produce the CIFAR-{10,100} results (lists specific LR, WD, ρ values).