SAM operates far from home: eigenvalue regularization as a dynamical phenomenon
Authors: Atish Agarwala, Yann Dauphin
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We then present experimental results on realistic models which show: The SAM-EOS predicts the largest eigenvalue for Wide Resnet 28-10 on CIFAR10. 3. Experiments on basic models 4. Connection to realistic models |
| Researcher Affiliation | Industry | Atish Agarwala * 1 Yann Dauphin * 1 1Google Deep Mind. Correspondence to: Atish Agarwala <thetish@google.com>. |
| Pseudocode | No | The paper provides mathematical equations and theoretical derivations but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain any statement about releasing source code for the methodology or provide a link to a code repository. |
| Open Datasets | Yes | We conducted experiments on the popular CIFAR-10 dataset (Krizhevsky et al., 2009) using the Wide Resnet 28-10 architecture (Zagoruyko & Komodakis, 2016). |
| Dataset Splits | No | The paper mentions using CIFAR-10 and training on the 'first 2 classes of CIFAR' but does not specify the train/validation/test dataset splits (e.g., percentages, sample counts, or explicit standard split references beyond just the dataset name). |
| Hardware Specification | No | The paper does not provide specific hardware details (such as GPU or CPU models, memory, or cloud instance types) used for running the experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers (e.g., programming languages, libraries, or frameworks like PyTorch or TensorFlow with their versions). |
| Experiment Setup | Yes | For MSE, we use η = 0.3, µ = 0.005 and η = 0.4, µ = 0.005 for cross-entropy. We use the cosine learning rate schedule (Loshchilov & Hutter, 2016) and SGD instead of Nesterov momentum (Sutskever et al., 2013) to better match the theoretical setup. (...) We keep all other hyper-parameters to the default values described in the original Wide Resnet paper. |