How to Escape Sharp Minima with Random Perturbations
Authors: Kwangjun Ahn, Ali Jadbabaie, Suvrit Sra
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We run experiments based on training Res Net-18 on the CIFAR10 dataset to test the ability of proposed algorithms to escape sharp global minima. Following (Damian et al., 2021), the algorithms are initialized at a point corresponding to a sharp global minimizer that achieve poor test accuracy. Crucially, we choose this setting because (Damian et al., 2021, Figure 1) verify that test accuracy is inversely correlated with the trace of Hessian (see Figure 2). This bad global minimizer, due to (Liu et al., 2020), achieves 100% training accuracy, but only 48% test accuracy. We choose the constant learning rate of η = 0.001, which is small enough such that SGD baseline without any perturbation does not escape.We discuss the results one by one. First of all, we highlight that the training accuracy stays at 100% for all algorithms.Comparison between two methods. In the left plot of Figure 3, we compare the performance of Randomly Smoothed Perturbation ( RS ) and Sharpness-Aware Perturbation ( SA ). We choose the batch size of 128 for both methods. Consistent with our theory, one can see that SA is more effective in escaping sharp minima even with a smaller perturbation radius ρ. Different batch sizes. Our theory suggests that batch size 1 should be effective in escaping sharp minima. We verify this in the right plot of Figure 3 by choosing the batch size to be B = 1, 64, 128. We do see that the case of B = 1 is quite effective in escaping sharp minima. |
| Researcher Affiliation | Collaboration | Kwangjun Ahn 1 2 Ali Jadbabaie 1 Suvrit Sra 1 3 1MIT 2Microsoft Research 3TU Munich. |
| Pseudocode | Yes | Algorithm 1 Randomly Smoothed Perturbation Algorithm 2 Sharpness-Aware Perturbation |
| Open Source Code | No | The paper does not provide any explicit statement about releasing code or a link to a code repository for the described methodology. |
| Open Datasets | Yes | We run experiments based on training Res Net-18 on the CIFAR10 dataset to test the ability of proposed algorithms to escape sharp global minima. |
| Dataset Splits | No | The paper mentions training on the CIFAR10 dataset and discusses training and test accuracy, but it does not specify the dataset splits (e.g., percentages or counts for training, validation, and test sets) needed for reproduction. |
| Hardware Specification | No | The paper does not specify any hardware details (e.g., GPU/CPU models, memory, or specific computing platforms) used for running the experiments. |
| Software Dependencies | No | The paper does not list any specific software dependencies or their version numbers (e.g., Python, PyTorch, TensorFlow, specific libraries, or solvers) that would be needed to reproduce the experiments. |
| Experiment Setup | Yes | We choose the constant learning rate of η = 0.001, which is small enough such that SGD baseline without any perturbation does not escape. We choose the batch size of 128 for both methods. We verify this in the right plot of Figure 3 by choosing the batch size to be B = 1, 64, 128. |