reproducibilityindex.ai

Practical Sharpness-Aware Minimization Cannot Converge All the Way to Optima

Authors: Dongkuk Si, Chulhee Yun

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our results highlight vastly different characteristics of SAM with vs. without decaying perturbation size or gradient normalization, and suggest that the intuitions gained from one version may not apply to the other. (...) Figure 1: Trajectory plot for a function f(x, y) = (xy 1)2. (...) Figure 2: Examples of virtual loss plot for deterministic and stochastic SAM. (...) Figure 6: The results of the SAM simulations on the example functions. The yellow line indicates the trajectory of SAM iterates. (...) All plots empirically verify that practical SAM cannot converge all the way to optima.
Researcher Affiliation	Academia	Dongkuk Si Chulhee Yun Korea Advanced Institute of Science and Technology (KAIST) {dongkuksi, chulhee.yun}@kaist.ac.kr
Pseudocode	No	The paper describes the SAM update equations (2) and (3) in text but does not present them in a formal pseudocode block or algorithm box.
Open Source Code	No	The paper does not provide any explicit statements or links to open-source code for the methodology described.
Open Datasets	No	The paper performs simulations on 'example functions' (e.g., f(x,y) = (xy-1)^2) to illustrate theoretical points, rather than using established public datasets for training. Thus, there is no mention of a publicly available dataset for training.
Dataset Splits	No	The paper does not mention or specify any training/validation/test dataset splits. The research is primarily theoretical with illustrative simulations on synthetic functions, not empirical studies on large datasets.
Hardware Specification	No	The paper does not provide any specific hardware details used for running its simulations or analyses.
Software Dependencies	No	The paper does not provide specific software dependencies (e.g., library names with version numbers) needed to replicate its theoretical analyses or simulations.
Experiment Setup	Yes	For deterministic SAM, we prove convergence to global minima of smooth strongly convex functions, and show the tightness of convergence rate in terms of T. Furthermore, we establish the convergence of SAM to stationary points of smooth convex functions. For smooth nonconvex functions, we prove that SAM guarantees convergence to stationary points up to an additive factor O(ρ2). We provide a worst-case example that always suffers a matching squared gradient norm Ω(ρ2), showing that the additive factor is unavoidable and tight in terms of ρ. (...) All plots empirically verify that practical SAM cannot converge all the way to optima. Instead, the iterates get trapped in certain regions. (a) and (d) display deterministic SAM iterates (with initialization x0 = 0.4) and the plot of x-coordinate values over epochs, for a smooth nonconvex function as shown in Figure 2(a) under settings in Theorem 3.5.