Fundamental Convergence Analysis of Sharpness-Aware Minimization

Authors: Pham Khanh, Hoang-Chau Luong, Boris Mordukhovich, Dat Tran

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Numerical experiments are conducted on classification tasks using deep learning models to confirm the practical aspects of our analysis.
Researcher Affiliation Academia Pham Duy Khanh Ho Chi Minh City University of Education khanhpd@hcmue.edu.vn Hoang-Chau Luong VNU-HCM University of Science lhchau20@apcs.fitus.edu.vn Boris S. Mordukhovich Wayne State University boris@math.wayne.edu Dat Ba Tran Wayne State University tranbadat@wayne.edu
Pseudocode Yes Algorithm 1 Inexact Gradient Descent (IGD) Methods; Algorithm 1a General framework for normalized variants of SAM; Algorithm 2 IGDr; Algorithm 2a [Andriushchenko and Flammarion, 2022] Unnormalized Sharpness-Aware Minimization (USAM); Algorithm 2b [Korpelevich, 1976] Extragradient Method
Open Source Code No The paper does not explicitly provide a link to its source code or a statement of code release within the main text.
Open Datasets Yes The algorithms are tested on two widely used image datasets: CIFAR-10 [Krizhevsky et al., 2009] and CIFAR-100 [Krizhevsky et al., 2009].
Dataset Splits Yes We train well-known deep neural networks including Res Net18 [He et al., 2016], Res Net34 [He et al., 2016], and Wide Res Net28-10 [Zagoruyko and Komodakis, 2016] on this dataset by using 10% of the training set as a validation set.
Hardware Specification Yes All the experiments are conducted on a computer with NVIDIA RTX 3090 GPU.
Software Dependencies No The paper mentions "SGD Momentum" as a base optimizer but does not specify version numbers for any software, libraries, or frameworks used in the experiments.
Experiment Setup Yes All the models are trained by using SAM with SGD Momentum as the base optimizer for 200 epochs and a batch size of 128. ...we set the initial stepsize to 0.1, momentum to 0.9, the ℓ2-regularization parameter to 0.001, and the perturbation radius ρ to 0.05.