Make Sharpness-Aware Minimization Stronger: A Sparsified Perturbation Approach
Authors: Peng Mi, Li Shen, Tianhe Ren, Yiyi Zhou, Xiaoshuai Sun, Rongrong Ji, Dacheng Tao
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experimental results on CIFAR10, CIFAR100, and Image Net-1K confirm the superior efficiency of our method to SAM, and the performance is preserved or even better with a perturbation of merely 50% sparsity. |
| Researcher Affiliation | Collaboration | 1Media Analytics and Computing Laboratory, Department of Artificial Intelligence, School of Informatics, Xiamen University, China 2JD Explore Academy, Beijing, China 3The University of Sydney, Australia |
| Pseudocode | Yes | Algorithm 1 Sparse SAM (SSAM) and Algorithm 2 Sparse Mask Generation |
| Open Source Code | Yes | Code is available at https: //github.com/Mi-Peng/Sparse-Sharpness-Aware-Minimization. |
| Open Datasets | Yes | Datasets. We use CIFAR10/CIFAR100 [33] and Image Net-1K [8] as the benchmarks of our method. |
| Dataset Splits | Yes | CIFAR10 and CIFAR100 have 50,000 images of 32 32 resolution for training, while 10,000 images for test. Image Net-1K [8] is the most widely used benchmark for image classification, which has 1,281,167 images of 1000 classes and 50,000 images for validation. |
| Hardware Specification | No | The paper does not specify the exact hardware components (e.g., specific GPU models, CPU types, or memory amounts) used for running the experiments. The self-reported checklist also states '[No]' for 'type of resources used'. |
| Software Dependencies | No | The paper mentions software frameworks generally but does not provide specific version numbers for any key software components or libraries (e.g., PyTorch 1.x, Python 3.x). |
| Experiment Setup | Yes | The models on CIFAR10/CIFAR100 are trained with 128 batch size for 200 epochs. We apply the random crop, random horizontal flip, normalization and cutout [11] for data augmentation, and the initial learning rate is 0.05 with a cosine learning rate schedule. The momentum and weight decay of SGD are set to 0.9 and 5e-4, respectively. SAM and SSAM apply the same settings, except that weight decay is set to 0.001 [14]. We determine the perturbation magnitude ρ from {0.01, 0.02, 0.05, 0.1, 0.2, 0.5} via grid search. In CIFAR10 and CIFAR100, we set ρ as 0.1 and 0.2, respectively. For Image Net-1K, ...train Res Net with a batch size of 256, and adopt the cosine learning rate schedule with initial learning rate 0.1. The momentum and weight decay of SGD is set as 0.9 and 1e-4. SAM and SSAM use the same settings as above. The perturbation magnitude ρ is set to 0.07. |