How Sharpness-Aware Minimization Minimizes Sharpness?
Authors: Kaiyue Wen, Tengyu Ma, Zhiyuan Li
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | This paper rigorously nails down the exact sharpness notion that SAM regularizes and clarifies the underlying mechanism. We also show that the two steps of approximations in the original motivation of SAM individually lead to inaccurate local conclusions, but their combination accidentally reveals the correct effect, when full-batch gradients are applied. Furthermore, we also prove that the stochastic version of SAM in fact regularizes the third notion of sharpness mentioned above, which is most likely to be the preferred notion for practical performance. |
| Researcher Affiliation | Academia | Kaiyue Wen Institute for Interdisciplinary Information Sciences Tsinghua University wenky20@mails.tsinghua.edu.cn Tengyu Ma, Zhiyuan Li Computer Science Department Stanford University {tengyuma,zhiyuanli}@stanford.edu |
| Pseudocode | No | The paper provides equations for algorithms, such as |
| Open Source Code | No | The paper is theoretical and does not mention releasing source code for its methodology. |
| Open Datasets | No | The paper describes a |
| Dataset Splits | No | The paper is theoretical and does not conduct experiments with dataset splits. It describes a |
| Hardware Specification | No | The paper does not provide any specific hardware details for running its analysis or the toy example visualization. |
| Software Dependencies | No | The paper does not provide specific ancillary software details with version numbers. |
| Experiment Setup | No | The paper primarily focuses on theoretical analysis of the SAM algorithm. While it analyzes parameters like |