An Adaptive Policy to Employ Sharpness-Aware Minimization

Authors: Weisen Jiang, Hansi Yang, Yu Zhang, James Kwok

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results on various datasets and architectures demonstrate the efficiency and effectiveness of the adaptive policy. In this section, we evaluate the proposed AE-SAM and AE-Look SAM on several standard benchmarks. Experiments are performed on the CIFAR-10 and CIFAR-100 datasets (Krizhevsky & Hinton, 2009) using four network architectures: ResNet-18 (He et al., 2016), Wide ResNet-28-10 (denoted WRN-28-10) (Zagoruyko & Komodakis, 2016), Pyramid Net-110 (Han et al., 2017), and ViT-S16 (Dosovitskiy et al., 2021).
Researcher Affiliation Academia 1 Guangdong Provincial Key Laboratory of Brain-inspired Intelligent Computation Department of Computer Science and Engineering, Southern University of Science and Technology 2 Department of Computer Science and Engineering, Hong Kong University of Science and Technology 3 Peng Cheng Laboratory
Pseudocode Yes Algorithm 1 AE-SAM and AE-Look SAM .
Open Source Code No The paper does not provide any concrete access to source code (e.g., repository link or explicit release statement).
Open Datasets Yes Experiments are performed on the CIFAR-10 and CIFAR-100 datasets (Krizhevsky & Hinton, 2009)... we perform experiments on the Image Net (Russakovsky et al., 2015)...
Dataset Splits Yes 10% of the training set is used as the validation set.
Hardware Specification No No specific hardware details (e.g., GPU/CPU models, memory) are provided for running experiments.
Software Dependencies No The paper mentions software like 'SGD optimizer' and 'PyTorch distributed' but does not specify version numbers for any key software components.
Experiment Setup Yes Following the setup in (Liu et al., 2022; Foret et al., 2021; Zhao et al., 2022a), we use batch size 128, initial learning rate of 0.1, cosine learning rate schedule, SGD optimizer with momentum 0.9 and weight decay 0.0001. The number of training epochs is 300 for Pyramid Net-110, 1200 for ViT-S16, and 200 for ResNet-18 and Wide ResNet-28-10. 10% of the training set is used as the validation set. As in Foret et al. (2021), we perform grid search for the radius ρ over {0.01, 0.02, 0.05, 0.1, 0.2, 0.5} using the validation set. Similarly, α is selected by grid search over {0.1, 0.3, 0.6, 0.9}. For the ct schedule gλ1,λ2(t), λ1 = 1 and λ2 = 1 for AE-SAM; λ1 = 0 and λ2 = 2 for AE-Look SAM.