Efficient Sharpness-aware Minimization for Improved Training of Neural Networks
Authors: Jiawei Du, Hanshu Yan, Jiashi Feng, Joey Tianyi Zhou, Liangli Zhen, Rick Siow Mong Goh, Vincent Tan
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct experiments on several benchmark datasets: CIFAR-10 (Krizhevsky et al., 2009), CIFAR-100 (Krizhevsky et al., 2009) and Image Net (Deng et al., 2009), using various model architectures: Res Net (He et al., 2016), Wide Res Net (Zagoruyko & Komodakis, 2016), and Pyramid Net (Han et al., 2017). |
| Researcher Affiliation | Collaboration | Jiawei Du1,2 , Hanshu Yan2 , Jiashi Feng2 , Joey Tianyi Zhou1 , Liangli Zhen4 , Rick Siow Mong Goh4 , Vincent Y. F. Tan3,2 1Centre for Frontier AI Research (CFAR), A*STAR, Singapore, 2Department of Electrical and Computer Engineering, National University of Singapore 3Department of Mathematics, National University of Singapore 4Institute of High Performance Computing (IHPC), A*STAR, Singapore |
| Pseudocode | Yes | Algorithm 1 Efficient SAM (ESAM) |
| Open Source Code | Yes | Our codes are avaliable at https://github.com/dydjw9/Efficient_SAM. |
| Open Datasets | Yes | We conduct experiments on several benchmark datasets: CIFAR-10 (Krizhevsky et al., 2009), CIFAR-100 (Krizhevsky et al., 2009) and Image Net (Deng et al., 2009) |
| Dataset Splits | Yes | We train all the models with 3 different random seeds using a batch size of 128, weight decay 10 4 and cosine learning rate decay (Loshchilov & Hutter, 2017). The training epochs are set to be 200 for Res Net-18 (He et al., 2016), Wide Res Net-28-10 (Zagoruyko & Komodakis, 2016), and 300 for Pyramid Net-110 (Han et al., 2017). The details of training setting are listed in Appendix A.7. |
| Hardware Specification | No | No specific hardware details (e.g., GPU/CPU models, memory amounts) used for running experiments were mentioned in the paper. |
| Software Dependencies | No | The paper mentions using the 'Py Torch framework' but does not specify a version number or list other software dependencies with their versions. |
| Experiment Setup | Yes | We set all the training settings, including the maximum number of training epochs, iterations per epoch, and data augmentations, the same for fair comparison among SGD, SAM and ESAM. Additionally, the other hyperparameters of SGD, SAM and ESAM have been tuned separately for optimal test accuracies using grid search. We train all the models with 3 different random seeds using a batch size of 128, weight decay 10 4 and cosine learning rate decay (Loshchilov & Hutter, 2017). The training epochs are set to be 200 for Res Net-18 (He et al., 2016), Wide Res Net-28-10 (Zagoruyko & Komodakis, 2016), and 300 for Pyramid Net-110 (Han et al., 2017). We set β = 0.6 and γ = 0.5 for Res Net-18 and Pyramid Net110 models; and set β = 0.5 and γ = 0.5 for Wide Res Net-28-10. The details of training setting are listed in Appendix A.7. |