Fundamental Convergence Analysis of Sharpness-Aware Minimization
Authors: Pham Khanh, Hoang-Chau Luong, Boris Mordukhovich, Dat Tran
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Numerical experiments are conducted on classification tasks using deep learning models to confirm the practical aspects of our analysis. |
| Researcher Affiliation | Academia | Pham Duy Khanh Ho Chi Minh City University of Education khanhpd@hcmue.edu.vn Hoang-Chau Luong VNU-HCM University of Science lhchau20@apcs.fitus.edu.vn Boris S. Mordukhovich Wayne State University boris@math.wayne.edu Dat Ba Tran Wayne State University tranbadat@wayne.edu |
| Pseudocode | Yes | Algorithm 1 Inexact Gradient Descent (IGD) Methods; Algorithm 1a General framework for normalized variants of SAM; Algorithm 2 IGDr; Algorithm 2a [Andriushchenko and Flammarion, 2022] Unnormalized Sharpness-Aware Minimization (USAM); Algorithm 2b [Korpelevich, 1976] Extragradient Method |
| Open Source Code | No | The paper does not explicitly provide a link to its source code or a statement of code release within the main text. |
| Open Datasets | Yes | The algorithms are tested on two widely used image datasets: CIFAR-10 [Krizhevsky et al., 2009] and CIFAR-100 [Krizhevsky et al., 2009]. |
| Dataset Splits | Yes | We train well-known deep neural networks including Res Net18 [He et al., 2016], Res Net34 [He et al., 2016], and Wide Res Net28-10 [Zagoruyko and Komodakis, 2016] on this dataset by using 10% of the training set as a validation set. |
| Hardware Specification | Yes | All the experiments are conducted on a computer with NVIDIA RTX 3090 GPU. |
| Software Dependencies | No | The paper mentions "SGD Momentum" as a base optimizer but does not specify version numbers for any software, libraries, or frameworks used in the experiments. |
| Experiment Setup | Yes | All the models are trained by using SAM with SGD Momentum as the base optimizer for 200 epochs and a batch size of 128. ...we set the initial stepsize to 0.1, momentum to 0.9, the ℓ2-regularization parameter to 0.001, and the perturbation radius ρ to 0.05. |