How Sharpness-Aware Minimization Minimizes Sharpness?

Authors: Kaiyue Wen, Tengyu Ma, Zhiyuan Li

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical This paper rigorously nails down the exact sharpness notion that SAM regularizes and clarifies the underlying mechanism. We also show that the two steps of approximations in the original motivation of SAM individually lead to inaccurate local conclusions, but their combination accidentally reveals the correct effect, when full-batch gradients are applied. Furthermore, we also prove that the stochastic version of SAM in fact regularizes the third notion of sharpness mentioned above, which is most likely to be the preferred notion for practical performance.
Researcher Affiliation Academia Kaiyue Wen Institute for Interdisciplinary Information Sciences Tsinghua University wenky20@mails.tsinghua.edu.cn Tengyu Ma, Zhiyuan Li Computer Science Department Stanford University {tengyuma,zhiyuanli}@stanford.edu
Pseudocode No The paper provides equations for algorithms, such as
Open Source Code No The paper is theoretical and does not mention releasing source code for its methodology.
Open Datasets No The paper describes a
Dataset Splits No The paper is theoretical and does not conduct experiments with dataset splits. It describes a
Hardware Specification No The paper does not provide any specific hardware details for running its analysis or the toy example visualization.
Software Dependencies No The paper does not provide specific ancillary software details with version numbers.
Experiment Setup No The paper primarily focuses on theoretical analysis of the SAM algorithm. While it analyzes parameters like