Sharpness-Aware Training for Free

Authors: JIAWEI DU, Daquan Zhou, Jiashi Feng, Vincent Tan, Joey Tianyi Zhou

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive empirical results show that SAF minimizes the sharpness in the same way that SAM does, yielding better results on the Image Net dataset with essentially the same computational cost as the base optimizer.
Researcher Affiliation Collaboration Jiawei Du1,2 , Daquan Zhou3 , Jiashi Feng3 , Vincent Y. F. Tan4,2 , Joey Tianyi Zhou1,2 1Centre for Frontier AI Research (CFAR), A*STAR, Singapore, 2Department of Electrical and Computer Engineering, National University of Singapore 3Byte Dance 4Department of Mathematics, National University of Singapore
Pseudocode Yes Algorithm 1 Training with SAF and MESA
Open Source Code Yes Our codes are available at https://github.com/Angus Dujw/SAF.
Open Datasets Yes We conduct experiments on the following image classification benchmark datasets: CIFAR-10, CIFAR-100 [16], and Image Net [3].
Dataset Splits No The paper states it uses CIFAR-10, CIFAR-100, and Image Net datasets, which commonly have predefined splits. However, it does not explicitly provide specific percentages, sample counts, or a detailed methodology for the training/validation/test splits in the main text. It mentions 'The details of the training setting are displayed in the Appendix' and 'We follow the settings of [2, 6, 20, 36] for the Image Net datasets', but lacks direct explicit statement within the provided text.
Hardware Specification Yes The Res Nets are trained with a batch size of 4096, 1.4 learning rate, 90 training epochs, and SGD optimizer (momentum=0.9) over 8 Nvidia V-100 GPU cards.
Software Dependencies No The paper states: 'The codes are implemented based on the TIMM framework [29].' However, it does not specify version numbers for TIMM or any other software dependencies, which is required for reproducibility.
Experiment Setup Yes Implementation details We set all the training hyperparameters to be the same for a fair comparison among the baselines and our proposed algorithms. The details of the training setting are displayed in the Appendix. We follow the settings of [2, 6, 20, 36] for the Image Net datasets, which is different from the experimental setting of the original SAM paper [8]. The codes are implemented based on the TIMM framework [29]. The Res Nets are trained with a batch size of 4096, 1.4 learning rate, 90 training epochs, and SGD optimizer (momentum=0.9) over 8 Nvidia V-100 GPU cards. The Vi Ts are trained with 300 training epochs and Adam W optimizer (β1 = 0.9, β2 = 0.999). We only conduct the basic data augmentation for the training on both CIFAR and Image Net (Inceptionstyle data augmentation). The hyperparameters of SAF and MESA are consistent among various DNNs architectures and various datasets. We set τ = 5, E = 3, Estart = 5 for all the experiments, λ {0.3, 0.8} for SAF and MESA , respectively.