reproducibilityindex.ai

A Modern Look at the Relationship between Sharpness and Generalization

Authors: Maksym Andriushchenko, Francesco Croce, Maximilian Müller, Matthias Hein, Nicolas Flammarion

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We comprehensively explore this question in a detailed study of various definitions of adaptive sharpness in settings ranging from training from scratch on Image Net and CIFAR10 to fine-tuning CLIP on Image Net and BERT on MNLI.
Researcher Affiliation	Academia	1EPFL 2T ubingen AI Center 3University of T ubingen. Correspondence to: Maksym Andriushchenko <maksym.andriushchenko@epfl.ch>.
Pseudocode	Yes	For convenience we restate the algorithm of Auto-PGD in Algorithm 1
Open Source Code	Yes	Our code is available at https://github.com/tml-epfl/ sharpness-vs-generalization.
Open Datasets	Yes	We comprehensively explore this question in a detailed study of various definitions of adaptive sharpness in settings ranging from training from scratch on Image Net and CIFAR10 to fine-tuning CLIP on Image Net and BERT on MNLI.
Dataset Splits	No	No explicit statement found providing specific dataset split information (exact percentages, sample counts, or detailed splitting methodology) for training, validation, and test sets. The paper assumes standard splits for public datasets or refers to models from other works.
Hardware Specification	No	No specific hardware details such as GPU/CPU models, processor types, or memory amounts used for running experiments are provided in the paper.
Software Dependencies	No	The paper mentions using 'vit-pytorch library' for ViT architecture and 'Auto-PGD' for sharpness evaluation, but does not provide specific version numbers for these or other software dependencies like PyTorch, Python, or CUDA.
Experiment Setup	Yes	We train models for 200 epochs using SGD with momentum and linearly decreasing learning rates after a linear warm-up for the first 40% iterations. We vary the learning rate, ρ {0, 0.05, 0.1} of SAM (Foret et al., 2021), mixup (α = 0.5) (Zhang et al., 2018), and standard augmentations combined with Rand Augment (Cubuk et al., 2020).