Fisher SAM: Information Geometry and Sharpness Aware Minimisation

Authors: Minyoung Kim, Da Li, Shell X Hu, Timothy Hospedales

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate improved performance of the proposed Fisher SAM on several benchmark datasets/tasks: image classification, Image Net overtraining, finetuning; and label-noise robust learning; and robustness to parameter perturbation during inference.
Researcher Affiliation Collaboration 1Samsung AI Center, Cambridge, UK 2University of Edinburgh. Correspondence to: Minyoung Kim <mikim21@gmail.com>.
Pseudocode Yes Algorithm 1 Fisher SAM.
Open Source Code No The paper references a publicly available Image Net pre-trained model (footnote 4: https://github.com/facebookresearch/deit) but does not provide a link or explicit statement about the open-source code for their proposed Fisher SAM methodology.
Open Datasets Yes on the CIFAR-10/100 datasets (Krizhevsky, 2009).
Dataset Splits No The paper does not explicitly provide specific training/validation/test dataset splits (e.g., percentages, sample counts) needed to reproduce the partitioning, nor does it refer to predefined splits with citations for all datasets used.
Hardware Specification Yes In practice, the difference is negligible: the per-batch (batch size 128) times for CIFAR10/WRN28-10 are: 0.2322 seconds (SAM), 0.2334 seconds (FSAM) on a single RTX 2080 Ti machine.
Software Dependencies No The paper mentions 'PyTorch', 'Tensorflow', and 'JAX' but does not specify their version numbers or the versions of any other specific software dependencies required for replication.
Experiment Setup Yes Following the experimental setups suggested in (Foret et al., 2021; Kwon et al., 2021), we employ several Res Net (He et al., 2016)-based backbone networks... we use the SGD optimiser with momentum 0.9, weight decay 0.0005, initial learning rate 0.1, cosine learning rate scheduling (Loshchilov & Hutter, 2016), for up to 200 epochs (400 for SGD) with batch size 128. For the Pyramid Net, we use batch size 256, initial learning rate 0.05 trained up to 900 epochs (1800 for SGD). We also apply Autoaugment (Cubuk et al., 2019), Cutout (De Vries & Taylor, 2017) data augmentation, and the label smoothing (M uller et al., 2019) with factor 0.1 is used for defining the loss function. We perform the grid search to find best hyperparameters (γ, η) for FSAM, and they are (γ = 0.1, η = 1.0) for both CIFAR-10 and CIFAR-100 across all backbones except for Pyramid Net. For the Pyramid Net on CIFAR-100, we set (γ = 0.5, η = 0.1).