Explicit Eigenvalue Regularization Improves Sharpness-Aware Minimization

Authors: Haocheng Luo, Tuan Truong, Tung Pham, Mehrtash Harandi, Dinh Phung, Trung Le

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We validate our theory and the effectiveness of the proposed algorithm through comprehensive experiments. Code is available at https://github.com/Ritian Luo/Eigen SAM.
Researcher Affiliation Collaboration Haocheng Luo1, Tuan Truong2, Tung Pham2, Mehrtash Harandi1, Dinh Phung1, Trung Le1 1Monash University, Australia 2Vin AI Research, Vietnam
Pseudocode Yes The full algorithm is presented in Algorithms 1 and 2. Algorithm 1 Power iteration to estimate the top eigenvector; Algorithm 2 Eigen-SAM
Open Source Code Yes Code is available at https://github.com/Ritian Luo/Eigen SAM.
Open Datasets Yes We trained a fully-connected network... on the MNIST dataset (Deng, 2012). ... we applied it to several image classification tasks on benchmark datasets, including CIFAR-10 (Krizhevsky et al., 2009), CIFAR-100 (Krizhevsky et al., 2009), Fashion-MNIST (Xiao et al., 2017), and SVHN (Netzer et al., 2011).
Dataset Splits Yes The batch size was set to 256, with training conducted for 200 epochs. We used an initial learning rate of 0.1 for CIFAR-10, Fashion-MNIST, and CIFAR-100, and 0.01 for SVHN, adjusting the learning rate over time with a cosine schedule. The weight decay was set to 5e-5, and the momentum was 0.9. ... we tune the hyperparameter α for Eigen-SAM over {0.05, 0.1, 0.2} using 10% of the training set as a validation set.
Hardware Specification Yes All our experiments were conducted on NVIDIA RTX 4090 24GB GPUs.
Software Dependencies No The paper mentions "Py Torch s official repository" for a pre-trained model checkpoint but does not specify version numbers for PyTorch or any other software dependencies.
Experiment Setup Yes The batch size was set to 256, with training conducted for 200 epochs. We used an initial learning rate of 0.1 for CIFAR-10, Fashion-MNIST, and CIFAR-100, and 0.01 for SVHN, adjusting the learning rate over time with a cosine schedule. The weight decay was set to 5e-5, and the momentum was 0.9. Detailed hyperparameter settings are provided in Appendix E.