reproducibilityindex.ai

Explicit Eigenvalue Regularization Improves Sharpness-Aware Minimization

Authors: Haocheng Luo, Tuan Truong, Tung Pham, Mehrtash Harandi, Dinh Phung, Trung Le

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We validate our theory and the effectiveness of the proposed algorithm through comprehensive experiments. Code is available at https://github.com/Ritian Luo/Eigen SAM.
Researcher Affiliation	Collaboration	Haocheng Luo1, Tuan Truong2, Tung Pham2, Mehrtash Harandi1, Dinh Phung1, Trung Le1 1Monash University, Australia 2Vin AI Research, Vietnam
Pseudocode	Yes	The full algorithm is presented in Algorithms 1 and 2. Algorithm 1 Power iteration to estimate the top eigenvector; Algorithm 2 Eigen-SAM
Open Source Code	Yes	Code is available at https://github.com/Ritian Luo/Eigen SAM.
Open Datasets	Yes	We trained a fully-connected network... on the MNIST dataset (Deng, 2012). ... we applied it to several image classification tasks on benchmark datasets, including CIFAR-10 (Krizhevsky et al., 2009), CIFAR-100 (Krizhevsky et al., 2009), Fashion-MNIST (Xiao et al., 2017), and SVHN (Netzer et al., 2011).
Dataset Splits	Yes	The batch size was set to 256, with training conducted for 200 epochs. We used an initial learning rate of 0.1 for CIFAR-10, Fashion-MNIST, and CIFAR-100, and 0.01 for SVHN, adjusting the learning rate over time with a cosine schedule. The weight decay was set to 5e-5, and the momentum was 0.9. ... we tune the hyperparameter α for Eigen-SAM over {0.05, 0.1, 0.2} using 10% of the training set as a validation set.
Hardware Specification	Yes	All our experiments were conducted on NVIDIA RTX 4090 24GB GPUs.
Software Dependencies	No	The paper mentions "Py Torch s official repository" for a pre-trained model checkpoint but does not specify version numbers for PyTorch or any other software dependencies.
Experiment Setup	Yes	The batch size was set to 256, with training conducted for 200 epochs. We used an initial learning rate of 0.1 for CIFAR-10, Fashion-MNIST, and CIFAR-100, and 0.01 for SVHN, adjusting the learning rate over time with a cosine schedule. The weight decay was set to 5e-5, and the momentum was 0.9. Detailed hyperparameter settings are provided in Appendix E.