A Modern Look at the Relationship between Sharpness and Generalization

Authors: Maksym Andriushchenko, Francesco Croce, Maximilian Müller, Matthias Hein, Nicolas Flammarion

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We comprehensively explore this question in a detailed study of various definitions of adaptive sharpness in settings ranging from training from scratch on Image Net and CIFAR10 to fine-tuning CLIP on Image Net and BERT on MNLI.
Researcher Affiliation Academia 1EPFL 2T ubingen AI Center 3University of T ubingen. Correspondence to: Maksym Andriushchenko <maksym.andriushchenko@epfl.ch>.
Pseudocode Yes For convenience we restate the algorithm of Auto-PGD in Algorithm 1
Open Source Code Yes Our code is available at https://github.com/tml-epfl/ sharpness-vs-generalization.
Open Datasets Yes We comprehensively explore this question in a detailed study of various definitions of adaptive sharpness in settings ranging from training from scratch on Image Net and CIFAR10 to fine-tuning CLIP on Image Net and BERT on MNLI.
Dataset Splits No No explicit statement found providing specific dataset split information (exact percentages, sample counts, or detailed splitting methodology) for training, validation, and test sets. The paper assumes standard splits for public datasets or refers to models from other works.
Hardware Specification No No specific hardware details such as GPU/CPU models, processor types, or memory amounts used for running experiments are provided in the paper.
Software Dependencies No The paper mentions using 'vit-pytorch library' for ViT architecture and 'Auto-PGD' for sharpness evaluation, but does not provide specific version numbers for these or other software dependencies like PyTorch, Python, or CUDA.
Experiment Setup Yes We train models for 200 epochs using SGD with momentum and linearly decreasing learning rates after a linear warm-up for the first 40% iterations. We vary the learning rate, ρ {0, 0.05, 0.1} of SAM (Foret et al., 2021), mixup (α = 0.5) (Zhang et al., 2018), and standard augmentations combined with Rand Augment (Cubuk et al., 2020).