Sharpness-Aware Training for Free
Authors: JIAWEI DU, Daquan Zhou, Jiashi Feng, Vincent Tan, Joey Tianyi Zhou
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive empirical results show that SAF minimizes the sharpness in the same way that SAM does, yielding better results on the Image Net dataset with essentially the same computational cost as the base optimizer. |
| Researcher Affiliation | Collaboration | Jiawei Du1,2 , Daquan Zhou3 , Jiashi Feng3 , Vincent Y. F. Tan4,2 , Joey Tianyi Zhou1,2 1Centre for Frontier AI Research (CFAR), A*STAR, Singapore, 2Department of Electrical and Computer Engineering, National University of Singapore 3Byte Dance 4Department of Mathematics, National University of Singapore |
| Pseudocode | Yes | Algorithm 1 Training with SAF and MESA |
| Open Source Code | Yes | Our codes are available at https://github.com/Angus Dujw/SAF. |
| Open Datasets | Yes | We conduct experiments on the following image classification benchmark datasets: CIFAR-10, CIFAR-100 [16], and Image Net [3]. |
| Dataset Splits | No | The paper states it uses CIFAR-10, CIFAR-100, and Image Net datasets, which commonly have predefined splits. However, it does not explicitly provide specific percentages, sample counts, or a detailed methodology for the training/validation/test splits in the main text. It mentions 'The details of the training setting are displayed in the Appendix' and 'We follow the settings of [2, 6, 20, 36] for the Image Net datasets', but lacks direct explicit statement within the provided text. |
| Hardware Specification | Yes | The Res Nets are trained with a batch size of 4096, 1.4 learning rate, 90 training epochs, and SGD optimizer (momentum=0.9) over 8 Nvidia V-100 GPU cards. |
| Software Dependencies | No | The paper states: 'The codes are implemented based on the TIMM framework [29].' However, it does not specify version numbers for TIMM or any other software dependencies, which is required for reproducibility. |
| Experiment Setup | Yes | Implementation details We set all the training hyperparameters to be the same for a fair comparison among the baselines and our proposed algorithms. The details of the training setting are displayed in the Appendix. We follow the settings of [2, 6, 20, 36] for the Image Net datasets, which is different from the experimental setting of the original SAM paper [8]. The codes are implemented based on the TIMM framework [29]. The Res Nets are trained with a batch size of 4096, 1.4 learning rate, 90 training epochs, and SGD optimizer (momentum=0.9) over 8 Nvidia V-100 GPU cards. The Vi Ts are trained with 300 training epochs and Adam W optimizer (β1 = 0.9, β2 = 0.999). We only conduct the basic data augmentation for the training on both CIFAR and Image Net (Inceptionstyle data augmentation). The hyperparameters of SAF and MESA are consistent among various DNNs architectures and various datasets. We set τ = 5, E = 3, Estart = 5 for all the experiments, λ {0.3, 0.8} for SAF and MESA , respectively. |