reproducibilityindex.ai

Surrogate Gap Minimization Improves Sharpness-Aware Training

Authors: Juntang Zhuang, Boqing Gong, Liangzhe Yuan, Yin Cui, Hartwig Adam, Nicha C Dvornek, sekhar tatikonda, James s Duncan, Ting Liu

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirically, GSAM consistently improves generalization (e.g., +3.2% over SAM and +5.4% over Adam W on Image Net top-1 accuracy for Vi T-B/32).
Researcher Affiliation	Collaboration	Juntang Zhuang1 j.zhuang@yale.edu Boqing Gong2, Liangzhe Yuan2, Yin Cui2, Hartwig Adam2 {bgong, lzyuan, yincui, hadam}@google.com Nicha C. Dvornek1, Sekhar Tatikonda1, James S. Duncan1 {nicha.dvornek, sekhar.tatikonda, james.duncan}@yale.edu liuti@google.com 1 Yale University, 2 Google Research
Pseudocode	Yes	Algorithm 1 GSAM Algorithm
Open Source Code	Yes	Code is released at https://sites.google.com/view/gsam-iclr22/home.
Open Datasets	Yes	We train on the Image Net-1k (Deng et al., 2009) training set using the Inception-style (Szegedy et al., 2015) pre-processing without extra training data or strong augmentation.
Dataset Splits	No	The paper mentions training on 'Image Net-1k' and evaluating on other ImageNet variants (v1, v2, Real), and discusses hyperparameter searching, but it does not explicitly describe a validation dataset split or its size/proportion.
Hardware Specification	No	The paper does not specify any particular hardware (e.g., CPU, GPU models, or TPUs) used for running the experiments.
Software Dependencies	No	The paper mentions 'Adam W optimizer', 'SGD with momentum', and 'Tensor Flow implementation', but does not provide specific version numbers for any software components or libraries.
Experiment Setup	Yes	For all models, we search for the best learning rate and weight decay for vanilla training, and then use the same values for the experiments with SAM and GSAM. For Res Nets, we search for ρ from 0.01 to 0.05 with a stepsize 0.01. For Vi Ts and Mixers, we search for ρ from 0.05 to 0.6 with a stepsize 0.05. In GSAM, we search for α in {0.01, 0.02, 0.03} for Res Nets and α in {0.1, 0.2, 0.3} for Vi Ts and Mixers. We summarize the best hyper-parameters for each model in Appendix B.