reproducibilityindex.ai

Enhancing Sharpness-Aware Optimization Through Variance Suppression

Authors: Bingcong Li, Georgios Giannakis

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Numerical experiments conﬁrm the merits of stabilized adversary in Va SSO. It is demonstrated on image classiﬁcation and neural machine translation tasks that Va SSO is capable of i) improving generalizability over SAM model-agnostically; and ii) nontrivially robustifying neural networks under the appearance of large label noise.
Researcher Affiliation	Academia	Bingcong Li Georgios B. Giannakis University of Minnesota Twin Cities Minneapolis, MN, USA {lixx5599, georgios}@umn.edu
Pseudocode	Yes	Algorithm 1 Generic form of SAM
Open Source Code	Yes	Code is available at https://github.com/Bingcong Li/Va SSO.
Open Datasets	Yes	CIFAR10. Neural networks including VGG-11, Res Net-18, WRN-28-10 and Pyramid Net-110 are trained on CIFAR10. Standard implementation including random crop, random horizontal ﬂip, normalization and cutout (Devries and Taylor, 2017) are leveraged for data augmentation.
Dataset Splits	Yes	CIFAR10. Neural networks including VGG-11, Res Net-18, WRN-28-10 and Pyramid Net-110 are trained on CIFAR10. Standard implementation including random crop, random horizontal ﬂip, normalization and cutout (Devries and Taylor, 2017) are leveraged for data augmentation.
Hardware Specification	Yes	All experiments are run on NVIDIA V100 GPUs.
Software Dependencies	No	The paper mentions using 'fairseq implementation' and specific optimizers like 'SGD' and 'Adam W', but it does not list any software dependencies with specific version numbers (e.g., PyTorch 1.9, Python 3.8).
Experiment Setup	Yes	For CIFAR10. Neural networks including VGG-11, Res Net-18, WRN-28-10 and Pyramid Net-110 are trained on CIFAR10. Standard implementation including random crop, random horizontal ﬂip, normalization and cutout (Devries and Taylor, 2017) are leveraged for data augmentation. The ﬁrst three models are trained for 200 epochs with a batchsize of 128, and Pyramid Net-110 is trained for 300 epochs using batchsize 256. Cosine learning rate schedule is applied in all settings. The ﬁrst three models use initial learning rate 0.05, and Pyramid Net adopts 0.1. Weight decay is chosen as 0.001 for SAM, ASAM, Fisher SAM and Va SSO following (Du et al., 2022a; Mi et al., 2022), but 0.0005 for SGD. We tune ρ from {0.01, 0.05, 0.1, 0.2, 0.5} for SAM and ﬁnd that ρ = 0.1 gives the best results for Res Net and WRN, ρ = 0.05 and ρ = 0.2 suit best for and VGG and Pyramid Net, respectively. ASAM and Va SSO adopt the same ρ as SAM. Fisher SAM uses the recommended ρ = 0.1 (Kim et al., 2022). For Va SSO, we tune θ = {0.4, 0.9} and report the best accuracy although Va SSO with both parameters outperforms SAM. We ﬁnd that θ = 0.4 works the best for Res Net-18 and WRN-28-10 while θ = 0.9 achieves the best accuracy in other cases.