Practical Sharpness-Aware Minimization Cannot Converge All the Way to Optima
Authors: Dongkuk Si, Chulhee Yun
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our results highlight vastly different characteristics of SAM with vs. without decaying perturbation size or gradient normalization, and suggest that the intuitions gained from one version may not apply to the other. (...) Figure 1: Trajectory plot for a function f(x, y) = (xy 1)2. (...) Figure 2: Examples of virtual loss plot for deterministic and stochastic SAM. (...) Figure 6: The results of the SAM simulations on the example functions. The yellow line indicates the trajectory of SAM iterates. (...) All plots empirically verify that practical SAM cannot converge all the way to optima. |
| Researcher Affiliation | Academia | Dongkuk Si Chulhee Yun Korea Advanced Institute of Science and Technology (KAIST) {dongkuksi, chulhee.yun}@kaist.ac.kr |
| Pseudocode | No | The paper describes the SAM update equations (2) and (3) in text but does not present them in a formal pseudocode block or algorithm box. |
| Open Source Code | No | The paper does not provide any explicit statements or links to open-source code for the methodology described. |
| Open Datasets | No | The paper performs simulations on 'example functions' (e.g., f(x,y) = (xy-1)^2) to illustrate theoretical points, rather than using established public datasets for training. Thus, there is no mention of a publicly available dataset for training. |
| Dataset Splits | No | The paper does not mention or specify any training/validation/test dataset splits. The research is primarily theoretical with illustrative simulations on synthetic functions, not empirical studies on large datasets. |
| Hardware Specification | No | The paper does not provide any specific hardware details used for running its simulations or analyses. |
| Software Dependencies | No | The paper does not provide specific software dependencies (e.g., library names with version numbers) needed to replicate its theoretical analyses or simulations. |
| Experiment Setup | Yes | For deterministic SAM, we prove convergence to global minima of smooth strongly convex functions, and show the tightness of convergence rate in terms of T. Furthermore, we establish the convergence of SAM to stationary points of smooth convex functions. For smooth nonconvex functions, we prove that SAM guarantees convergence to stationary points up to an additive factor O(ρ2). We provide a worst-case example that always suffers a matching squared gradient norm Ω(ρ2), showing that the additive factor is unavoidable and tight in terms of ρ. (...) All plots empirically verify that practical SAM cannot converge all the way to optima. Instead, the iterates get trapped in certain regions. (a) and (d) display deterministic SAM iterates (with initialization x0 = 0.4) and the plot of x-coordinate values over epochs, for a smooth nonconvex function as shown in Figure 2(a) under settings in Theorem 3.5. |