Enhancing Sharpness-Aware Optimization Through Variance Suppression

Authors: Bingcong Li, Georgios Giannakis

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Numerical experiments confirm the merits of stabilized adversary in Va SSO. It is demonstrated on image classification and neural machine translation tasks that Va SSO is capable of i) improving generalizability over SAM model-agnostically; and ii) nontrivially robustifying neural networks under the appearance of large label noise.
Researcher Affiliation Academia Bingcong Li Georgios B. Giannakis University of Minnesota Twin Cities Minneapolis, MN, USA {lixx5599, georgios}@umn.edu
Pseudocode Yes Algorithm 1 Generic form of SAM
Open Source Code Yes Code is available at https://github.com/Bingcong Li/Va SSO.
Open Datasets Yes CIFAR10. Neural networks including VGG-11, Res Net-18, WRN-28-10 and Pyramid Net-110 are trained on CIFAR10. Standard implementation including random crop, random horizontal flip, normalization and cutout (Devries and Taylor, 2017) are leveraged for data augmentation.
Dataset Splits Yes CIFAR10. Neural networks including VGG-11, Res Net-18, WRN-28-10 and Pyramid Net-110 are trained on CIFAR10. Standard implementation including random crop, random horizontal flip, normalization and cutout (Devries and Taylor, 2017) are leveraged for data augmentation.
Hardware Specification Yes All experiments are run on NVIDIA V100 GPUs.
Software Dependencies No The paper mentions using 'fairseq implementation' and specific optimizers like 'SGD' and 'Adam W', but it does not list any software dependencies with specific version numbers (e.g., PyTorch 1.9, Python 3.8).
Experiment Setup Yes For CIFAR10. Neural networks including VGG-11, Res Net-18, WRN-28-10 and Pyramid Net-110 are trained on CIFAR10. Standard implementation including random crop, random horizontal flip, normalization and cutout (Devries and Taylor, 2017) are leveraged for data augmentation. The first three models are trained for 200 epochs with a batchsize of 128, and Pyramid Net-110 is trained for 300 epochs using batchsize 256. Cosine learning rate schedule is applied in all settings. The first three models use initial learning rate 0.05, and Pyramid Net adopts 0.1. Weight decay is chosen as 0.001 for SAM, ASAM, Fisher SAM and Va SSO following (Du et al., 2022a; Mi et al., 2022), but 0.0005 for SGD. We tune ρ from {0.01, 0.05, 0.1, 0.2, 0.5} for SAM and find that ρ = 0.1 gives the best results for Res Net and WRN, ρ = 0.05 and ρ = 0.2 suit best for and VGG and Pyramid Net, respectively. ASAM and Va SSO adopt the same ρ as SAM. Fisher SAM uses the recommended ρ = 0.1 (Kim et al., 2022). For Va SSO, we tune θ = {0.4, 0.9} and report the best accuracy although Va SSO with both parameters outperforms SAM. We find that θ = 0.4 works the best for Res Net-18 and WRN-28-10 while θ = 0.9 achieves the best accuracy in other cases.