reproducibilityindex.ai

Efficient Sharpness-aware Minimization for Improved Training of Neural Networks

Authors: Jiawei Du, Hanshu Yan, Jiashi Feng, Joey Tianyi Zhou, Liangli Zhen, Rick Siow Mong Goh, Vincent Tan

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct experiments on several benchmark datasets: CIFAR-10 (Krizhevsky et al., 2009), CIFAR-100 (Krizhevsky et al., 2009) and Image Net (Deng et al., 2009), using various model architectures: Res Net (He et al., 2016), Wide Res Net (Zagoruyko & Komodakis, 2016), and Pyramid Net (Han et al., 2017).
Researcher Affiliation	Collaboration	Jiawei Du1,2 , Hanshu Yan2 , Jiashi Feng2 , Joey Tianyi Zhou1 , Liangli Zhen4 , Rick Siow Mong Goh4 , Vincent Y. F. Tan3,2 1Centre for Frontier AI Research (CFAR), ASTAR, Singapore, 2Department of Electrical and Computer Engineering, National University of Singapore 3Department of Mathematics, National University of Singapore 4Institute of High Performance Computing (IHPC), ASTAR, Singapore
Pseudocode	Yes	Algorithm 1 Efﬁcient SAM (ESAM)
Open Source Code	Yes	Our codes are avaliable at https://github.com/dydjw9/Efficient_SAM.
Open Datasets	Yes	We conduct experiments on several benchmark datasets: CIFAR-10 (Krizhevsky et al., 2009), CIFAR-100 (Krizhevsky et al., 2009) and Image Net (Deng et al., 2009)
Dataset Splits	Yes	We train all the models with 3 different random seeds using a batch size of 128, weight decay 10 4 and cosine learning rate decay (Loshchilov & Hutter, 2017). The training epochs are set to be 200 for Res Net-18 (He et al., 2016), Wide Res Net-28-10 (Zagoruyko & Komodakis, 2016), and 300 for Pyramid Net-110 (Han et al., 2017). The details of training setting are listed in Appendix A.7.
Hardware Specification	No	No specific hardware details (e.g., GPU/CPU models, memory amounts) used for running experiments were mentioned in the paper.
Software Dependencies	No	The paper mentions using the 'Py Torch framework' but does not specify a version number or list other software dependencies with their versions.
Experiment Setup	Yes	We set all the training settings, including the maximum number of training epochs, iterations per epoch, and data augmentations, the same for fair comparison among SGD, SAM and ESAM. Additionally, the other hyperparameters of SGD, SAM and ESAM have been tuned separately for optimal test accuracies using grid search. We train all the models with 3 different random seeds using a batch size of 128, weight decay 10 4 and cosine learning rate decay (Loshchilov & Hutter, 2017). The training epochs are set to be 200 for Res Net-18 (He et al., 2016), Wide Res Net-28-10 (Zagoruyko & Komodakis, 2016), and 300 for Pyramid Net-110 (Han et al., 2017). We set β = 0.6 and γ = 0.5 for Res Net-18 and Pyramid Net110 models; and set β = 0.5 and γ = 0.5 for Wide Res Net-28-10. The details of training setting are listed in Appendix A.7.