reproducibilityindex.ai

Sharpness-Aware Training for Free

Authors: JIAWEI DU, Daquan Zhou, Jiashi Feng, Vincent Tan, Joey Tianyi Zhou

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive empirical results show that SAF minimizes the sharpness in the same way that SAM does, yielding better results on the Image Net dataset with essentially the same computational cost as the base optimizer.
Researcher Affiliation	Collaboration	Jiawei Du1,2 , Daquan Zhou3 , Jiashi Feng3 , Vincent Y. F. Tan4,2 , Joey Tianyi Zhou1,2 1Centre for Frontier AI Research (CFAR), A*STAR, Singapore, 2Department of Electrical and Computer Engineering, National University of Singapore 3Byte Dance 4Department of Mathematics, National University of Singapore
Pseudocode	Yes	Algorithm 1 Training with SAF and MESA
Open Source Code	Yes	Our codes are available at https://github.com/Angus Dujw/SAF.
Open Datasets	Yes	We conduct experiments on the following image classification benchmark datasets: CIFAR-10, CIFAR-100 [16], and Image Net [3].
Dataset Splits	No	The paper states it uses CIFAR-10, CIFAR-100, and Image Net datasets, which commonly have predefined splits. However, it does not explicitly provide specific percentages, sample counts, or a detailed methodology for the training/validation/test splits in the main text. It mentions 'The details of the training setting are displayed in the Appendix' and 'We follow the settings of [2, 6, 20, 36] for the Image Net datasets', but lacks direct explicit statement within the provided text.
Hardware Specification	Yes	The Res Nets are trained with a batch size of 4096, 1.4 learning rate, 90 training epochs, and SGD optimizer (momentum=0.9) over 8 Nvidia V-100 GPU cards.
Software Dependencies	No	The paper states: 'The codes are implemented based on the TIMM framework [29].' However, it does not specify version numbers for TIMM or any other software dependencies, which is required for reproducibility.
Experiment Setup	Yes	Implementation details We set all the training hyperparameters to be the same for a fair comparison among the baselines and our proposed algorithms. The details of the training setting are displayed in the Appendix. We follow the settings of [2, 6, 20, 36] for the Image Net datasets, which is different from the experimental setting of the original SAM paper [8]. The codes are implemented based on the TIMM framework [29]. The Res Nets are trained with a batch size of 4096, 1.4 learning rate, 90 training epochs, and SGD optimizer (momentum=0.9) over 8 Nvidia V-100 GPU cards. The Vi Ts are trained with 300 training epochs and Adam W optimizer (β1 = 0.9, β2 = 0.999). We only conduct the basic data augmentation for the training on both CIFAR and Image Net (Inceptionstyle data augmentation). The hyperparameters of SAF and MESA are consistent among various DNNs architectures and various datasets. We set τ = 5, E = 3, Estart = 5 for all the experiments, λ {0.3, 0.8} for SAF and MESA , respectively.