Enhancing Sharpness-Aware Optimization Through Variance Suppression
Authors: Bingcong Li, Georgios Giannakis
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Numerical experiments confirm the merits of stabilized adversary in Va SSO. It is demonstrated on image classification and neural machine translation tasks that Va SSO is capable of i) improving generalizability over SAM model-agnostically; and ii) nontrivially robustifying neural networks under the appearance of large label noise. |
| Researcher Affiliation | Academia | Bingcong Li Georgios B. Giannakis University of Minnesota Twin Cities Minneapolis, MN, USA {lixx5599, georgios}@umn.edu |
| Pseudocode | Yes | Algorithm 1 Generic form of SAM |
| Open Source Code | Yes | Code is available at https://github.com/Bingcong Li/Va SSO. |
| Open Datasets | Yes | CIFAR10. Neural networks including VGG-11, Res Net-18, WRN-28-10 and Pyramid Net-110 are trained on CIFAR10. Standard implementation including random crop, random horizontal flip, normalization and cutout (Devries and Taylor, 2017) are leveraged for data augmentation. |
| Dataset Splits | Yes | CIFAR10. Neural networks including VGG-11, Res Net-18, WRN-28-10 and Pyramid Net-110 are trained on CIFAR10. Standard implementation including random crop, random horizontal flip, normalization and cutout (Devries and Taylor, 2017) are leveraged for data augmentation. |
| Hardware Specification | Yes | All experiments are run on NVIDIA V100 GPUs. |
| Software Dependencies | No | The paper mentions using 'fairseq implementation' and specific optimizers like 'SGD' and 'Adam W', but it does not list any software dependencies with specific version numbers (e.g., PyTorch 1.9, Python 3.8). |
| Experiment Setup | Yes | For CIFAR10. Neural networks including VGG-11, Res Net-18, WRN-28-10 and Pyramid Net-110 are trained on CIFAR10. Standard implementation including random crop, random horizontal flip, normalization and cutout (Devries and Taylor, 2017) are leveraged for data augmentation. The first three models are trained for 200 epochs with a batchsize of 128, and Pyramid Net-110 is trained for 300 epochs using batchsize 256. Cosine learning rate schedule is applied in all settings. The first three models use initial learning rate 0.05, and Pyramid Net adopts 0.1. Weight decay is chosen as 0.001 for SAM, ASAM, Fisher SAM and Va SSO following (Du et al., 2022a; Mi et al., 2022), but 0.0005 for SGD. We tune ρ from {0.01, 0.05, 0.1, 0.2, 0.5} for SAM and find that ρ = 0.1 gives the best results for Res Net and WRN, ρ = 0.05 and ρ = 0.2 suit best for and VGG and Pyramid Net, respectively. ASAM and Va SSO adopt the same ρ as SAM. Fisher SAM uses the recommended ρ = 0.1 (Kim et al., 2022). For Va SSO, we tune θ = {0.4, 0.9} and report the best accuracy although Va SSO with both parameters outperforms SAM. We find that θ = 0.4 works the best for Res Net-18 and WRN-28-10 while θ = 0.9 achieves the best accuracy in other cases. |