Make Sharpness-Aware Minimization Stronger: A Sparsified Perturbation Approach

Authors: Peng Mi, Li Shen, Tianhe Ren, Yiyi Zhou, Xiaoshuai Sun, Rongrong Ji, Dacheng Tao

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experimental results on CIFAR10, CIFAR100, and Image Net-1K confirm the superior efficiency of our method to SAM, and the performance is preserved or even better with a perturbation of merely 50% sparsity.
Researcher Affiliation Collaboration 1Media Analytics and Computing Laboratory, Department of Artificial Intelligence, School of Informatics, Xiamen University, China 2JD Explore Academy, Beijing, China 3The University of Sydney, Australia
Pseudocode Yes Algorithm 1 Sparse SAM (SSAM) and Algorithm 2 Sparse Mask Generation
Open Source Code Yes Code is available at https: //github.com/Mi-Peng/Sparse-Sharpness-Aware-Minimization.
Open Datasets Yes Datasets. We use CIFAR10/CIFAR100 [33] and Image Net-1K [8] as the benchmarks of our method.
Dataset Splits Yes CIFAR10 and CIFAR100 have 50,000 images of 32 32 resolution for training, while 10,000 images for test. Image Net-1K [8] is the most widely used benchmark for image classification, which has 1,281,167 images of 1000 classes and 50,000 images for validation.
Hardware Specification No The paper does not specify the exact hardware components (e.g., specific GPU models, CPU types, or memory amounts) used for running the experiments. The self-reported checklist also states '[No]' for 'type of resources used'.
Software Dependencies No The paper mentions software frameworks generally but does not provide specific version numbers for any key software components or libraries (e.g., PyTorch 1.x, Python 3.x).
Experiment Setup Yes The models on CIFAR10/CIFAR100 are trained with 128 batch size for 200 epochs. We apply the random crop, random horizontal flip, normalization and cutout [11] for data augmentation, and the initial learning rate is 0.05 with a cosine learning rate schedule. The momentum and weight decay of SGD are set to 0.9 and 5e-4, respectively. SAM and SSAM apply the same settings, except that weight decay is set to 0.001 [14]. We determine the perturbation magnitude ρ from {0.01, 0.02, 0.05, 0.1, 0.2, 0.5} via grid search. In CIFAR10 and CIFAR100, we set ρ as 0.1 and 0.2, respectively. For Image Net-1K, ...train Res Net with a batch size of 256, and adopt the cosine learning rate schedule with initial learning rate 0.1. The momentum and weight decay of SGD is set as 0.9 and 1e-4. SAM and SSAM use the same settings as above. The perturbation magnitude ρ is set to 0.07.