CR-SAM: Curvature Regularized Sharpness-Aware Minimization
Authors: Tao Wu, Tie Luo, Donald C. Wunsch II
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical evaluation on CIFAR and Image Net datasets shows that CR-SAM consistently enhances classification performance for Res Net and Vision Transformer (Vi T) models across various datasets. Our comprehensive evaluation of CR-SAM spans a diverse range of contemporary DNN architectures. The empirical findings affirm that CR-SAM consistently outperforms both SAM and SGD in terms of improving model generalizability, across multiple datasets including CIFAR10/100 and Image Net-1k/-C/-R. |
| Researcher Affiliation | Academia | Tao Wu1, Tie Luo1*, Donald C. Wunsch II2 1Department of Computer Science, Missouri University of Science and Technology 2Department of Electrical and Computer Engineering, Missouri University of Science and Technology {wuta, tluo, dwunsch}@mst.edu |
| Pseudocode | Yes | The full pseudo-code of our CR-SAM training is given in Algorithm 1. |
| Open Source Code | Yes | Our code is available at https://github.com/Trust AIo T/CR-SAM. |
| Open Datasets | Yes | To assess CR-SAM, we conduct thorough experiments on prominent image classification benchmark datasets: CIFAR10/CIFAR-100 and Image Net-1k/-C/-R. Specifically, we evaluate CR-SAM using the CIFAR-10/100 datasets (Krizhevsky, Hinton et al. 2009). This section details our evaluation on the Image Net dataset (Deng et al. 2009), containing 1.28 million images across 1000 classes. Evaluation is extended to out-of-distribution data, namely Image Net-C (Hendrycks and Dietterich 2019) and Image Net-R (Hendrycks et al. 2021). |
| Dataset Splits | No | The paper mentions conducting a grid search to determine optimal hyperparameters that yield the highest test accuracy, which typically involves a validation set. However, it does not explicitly provide details about specific training/validation/test splits, percentages, or the methodology for setting up a distinct validation set. |
| Hardware Specification | Yes | These experiments are implemented using Py Torch and executed on Nvidia A100 and V100 GPUs. |
| Software Dependencies | No | The paper states that experiments are "implemented using Py Torch" but does not provide specific version numbers for PyTorch or any other software dependencies. |
| Experiment Setup | Yes | we train all models from scratch for 200 epochs, using batch size 128 and employing a cosine learning rate schedule. We conduct grid search to determine the optimal learning rate, weight decay, perturbation magnitude (ρ), coefficient (α and β) values that yield the highest test accuracy. training Res Net50 and Res Net101 with batch size 512 for 90 epochs. The initial learning rate is set to 0.1, progressively decayed using a cosine schedule. For Vi T models, we adopt Adam W (Loshchilov and Hutter 2019) as the base optimizer with parameters β1 = 0.9 and β2 = 0.999. Vi Ts are trained with batch size 512 for 300 epochs. |