CR-SAM: Curvature Regularized Sharpness-Aware Minimization

Authors: Tao Wu, Tie Luo, Donald C. Wunsch II

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical evaluation on CIFAR and Image Net datasets shows that CR-SAM consistently enhances classification performance for Res Net and Vision Transformer (Vi T) models across various datasets. Our comprehensive evaluation of CR-SAM spans a diverse range of contemporary DNN architectures. The empirical findings affirm that CR-SAM consistently outperforms both SAM and SGD in terms of improving model generalizability, across multiple datasets including CIFAR10/100 and Image Net-1k/-C/-R.
Researcher Affiliation Academia Tao Wu1, Tie Luo1*, Donald C. Wunsch II2 1Department of Computer Science, Missouri University of Science and Technology 2Department of Electrical and Computer Engineering, Missouri University of Science and Technology {wuta, tluo, dwunsch}@mst.edu
Pseudocode Yes The full pseudo-code of our CR-SAM training is given in Algorithm 1.
Open Source Code Yes Our code is available at https://github.com/Trust AIo T/CR-SAM.
Open Datasets Yes To assess CR-SAM, we conduct thorough experiments on prominent image classification benchmark datasets: CIFAR10/CIFAR-100 and Image Net-1k/-C/-R. Specifically, we evaluate CR-SAM using the CIFAR-10/100 datasets (Krizhevsky, Hinton et al. 2009). This section details our evaluation on the Image Net dataset (Deng et al. 2009), containing 1.28 million images across 1000 classes. Evaluation is extended to out-of-distribution data, namely Image Net-C (Hendrycks and Dietterich 2019) and Image Net-R (Hendrycks et al. 2021).
Dataset Splits No The paper mentions conducting a grid search to determine optimal hyperparameters that yield the highest test accuracy, which typically involves a validation set. However, it does not explicitly provide details about specific training/validation/test splits, percentages, or the methodology for setting up a distinct validation set.
Hardware Specification Yes These experiments are implemented using Py Torch and executed on Nvidia A100 and V100 GPUs.
Software Dependencies No The paper states that experiments are "implemented using Py Torch" but does not provide specific version numbers for PyTorch or any other software dependencies.
Experiment Setup Yes we train all models from scratch for 200 epochs, using batch size 128 and employing a cosine learning rate schedule. We conduct grid search to determine the optimal learning rate, weight decay, perturbation magnitude (ρ), coefficient (α and β) values that yield the highest test accuracy. training Res Net50 and Res Net101 with batch size 512 for 90 epochs. The initial learning rate is set to 0.1, progressively decayed using a cosine schedule. For Vi T models, we adopt Adam W (Loshchilov and Hutter 2019) as the base optimizer with parameters β1 = 0.9 and β2 = 0.999. Vi Ts are trained with batch size 512 for 300 epochs.