How to Escape Sharp Minima with Random Perturbations

Authors: Kwangjun Ahn, Ali Jadbabaie, Suvrit Sra

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We run experiments based on training Res Net-18 on the CIFAR10 dataset to test the ability of proposed algorithms to escape sharp global minima. Following (Damian et al., 2021), the algorithms are initialized at a point corresponding to a sharp global minimizer that achieve poor test accuracy. Crucially, we choose this setting because (Damian et al., 2021, Figure 1) verify that test accuracy is inversely correlated with the trace of Hessian (see Figure 2). This bad global minimizer, due to (Liu et al., 2020), achieves 100% training accuracy, but only 48% test accuracy. We choose the constant learning rate of η = 0.001, which is small enough such that SGD baseline without any perturbation does not escape.We discuss the results one by one. First of all, we highlight that the training accuracy stays at 100% for all algorithms.Comparison between two methods. In the left plot of Figure 3, we compare the performance of Randomly Smoothed Perturbation ( RS ) and Sharpness-Aware Perturbation ( SA ). We choose the batch size of 128 for both methods. Consistent with our theory, one can see that SA is more effective in escaping sharp minima even with a smaller perturbation radius ρ. Different batch sizes. Our theory suggests that batch size 1 should be effective in escaping sharp minima. We verify this in the right plot of Figure 3 by choosing the batch size to be B = 1, 64, 128. We do see that the case of B = 1 is quite effective in escaping sharp minima.
Researcher Affiliation Collaboration Kwangjun Ahn 1 2 Ali Jadbabaie 1 Suvrit Sra 1 3 1MIT 2Microsoft Research 3TU Munich.
Pseudocode Yes Algorithm 1 Randomly Smoothed Perturbation Algorithm 2 Sharpness-Aware Perturbation
Open Source Code No The paper does not provide any explicit statement about releasing code or a link to a code repository for the described methodology.
Open Datasets Yes We run experiments based on training Res Net-18 on the CIFAR10 dataset to test the ability of proposed algorithms to escape sharp global minima.
Dataset Splits No The paper mentions training on the CIFAR10 dataset and discusses training and test accuracy, but it does not specify the dataset splits (e.g., percentages or counts for training, validation, and test sets) needed for reproduction.
Hardware Specification No The paper does not specify any hardware details (e.g., GPU/CPU models, memory, or specific computing platforms) used for running the experiments.
Software Dependencies No The paper does not list any specific software dependencies or their version numbers (e.g., Python, PyTorch, TensorFlow, specific libraries, or solvers) that would be needed to reproduce the experiments.
Experiment Setup Yes We choose the constant learning rate of η = 0.001, which is small enough such that SGD baseline without any perturbation does not escape. We choose the batch size of 128 for both methods. We verify this in the right plot of Figure 3 by choosing the batch size to be B = 1, 64, 128.