An Adaptive Policy to Employ Sharpness-Aware Minimization
Authors: Weisen Jiang, Hansi Yang, Yu Zhang, James Kwok
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results on various datasets and architectures demonstrate the efficiency and effectiveness of the adaptive policy. In this section, we evaluate the proposed AE-SAM and AE-Look SAM on several standard benchmarks. Experiments are performed on the CIFAR-10 and CIFAR-100 datasets (Krizhevsky & Hinton, 2009) using four network architectures: ResNet-18 (He et al., 2016), Wide ResNet-28-10 (denoted WRN-28-10) (Zagoruyko & Komodakis, 2016), Pyramid Net-110 (Han et al., 2017), and ViT-S16 (Dosovitskiy et al., 2021). |
| Researcher Affiliation | Academia | 1 Guangdong Provincial Key Laboratory of Brain-inspired Intelligent Computation Department of Computer Science and Engineering, Southern University of Science and Technology 2 Department of Computer Science and Engineering, Hong Kong University of Science and Technology 3 Peng Cheng Laboratory |
| Pseudocode | Yes | Algorithm 1 AE-SAM and AE-Look SAM . |
| Open Source Code | No | The paper does not provide any concrete access to source code (e.g., repository link or explicit release statement). |
| Open Datasets | Yes | Experiments are performed on the CIFAR-10 and CIFAR-100 datasets (Krizhevsky & Hinton, 2009)... we perform experiments on the Image Net (Russakovsky et al., 2015)... |
| Dataset Splits | Yes | 10% of the training set is used as the validation set. |
| Hardware Specification | No | No specific hardware details (e.g., GPU/CPU models, memory) are provided for running experiments. |
| Software Dependencies | No | The paper mentions software like 'SGD optimizer' and 'PyTorch distributed' but does not specify version numbers for any key software components. |
| Experiment Setup | Yes | Following the setup in (Liu et al., 2022; Foret et al., 2021; Zhao et al., 2022a), we use batch size 128, initial learning rate of 0.1, cosine learning rate schedule, SGD optimizer with momentum 0.9 and weight decay 0.0001. The number of training epochs is 300 for Pyramid Net-110, 1200 for ViT-S16, and 200 for ResNet-18 and Wide ResNet-28-10. 10% of the training set is used as the validation set. As in Foret et al. (2021), we perform grid search for the radius ρ over {0.01, 0.02, 0.05, 0.1, 0.2, 0.5} using the validation set. Similarly, α is selected by grid search over {0.1, 0.3, 0.6, 0.9}. For the ct schedule gλ1,λ2(t), λ1 = 1 and λ2 = 1 for AE-SAM; λ1 = 0 and λ2 = 2 for AE-Look SAM. |