Explicit Eigenvalue Regularization Improves Sharpness-Aware Minimization
Authors: Haocheng Luo, Tuan Truong, Tung Pham, Mehrtash Harandi, Dinh Phung, Trung Le
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We validate our theory and the effectiveness of the proposed algorithm through comprehensive experiments. Code is available at https://github.com/Ritian Luo/Eigen SAM. |
| Researcher Affiliation | Collaboration | Haocheng Luo1, Tuan Truong2, Tung Pham2, Mehrtash Harandi1, Dinh Phung1, Trung Le1 1Monash University, Australia 2Vin AI Research, Vietnam |
| Pseudocode | Yes | The full algorithm is presented in Algorithms 1 and 2. Algorithm 1 Power iteration to estimate the top eigenvector; Algorithm 2 Eigen-SAM |
| Open Source Code | Yes | Code is available at https://github.com/Ritian Luo/Eigen SAM. |
| Open Datasets | Yes | We trained a fully-connected network... on the MNIST dataset (Deng, 2012). ... we applied it to several image classification tasks on benchmark datasets, including CIFAR-10 (Krizhevsky et al., 2009), CIFAR-100 (Krizhevsky et al., 2009), Fashion-MNIST (Xiao et al., 2017), and SVHN (Netzer et al., 2011). |
| Dataset Splits | Yes | The batch size was set to 256, with training conducted for 200 epochs. We used an initial learning rate of 0.1 for CIFAR-10, Fashion-MNIST, and CIFAR-100, and 0.01 for SVHN, adjusting the learning rate over time with a cosine schedule. The weight decay was set to 5e-5, and the momentum was 0.9. ... we tune the hyperparameter α for Eigen-SAM over {0.05, 0.1, 0.2} using 10% of the training set as a validation set. |
| Hardware Specification | Yes | All our experiments were conducted on NVIDIA RTX 4090 24GB GPUs. |
| Software Dependencies | No | The paper mentions "Py Torch s official repository" for a pre-trained model checkpoint but does not specify version numbers for PyTorch or any other software dependencies. |
| Experiment Setup | Yes | The batch size was set to 256, with training conducted for 200 epochs. We used an initial learning rate of 0.1 for CIFAR-10, Fashion-MNIST, and CIFAR-100, and 0.01 for SVHN, adjusting the learning rate over time with a cosine schedule. The weight decay was set to 5e-5, and the momentum was 0.9. Detailed hyperparameter settings are provided in Appendix E. |