Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Momentum-SAM: Sharpness Aware Minimization without Computational Overhead

Authors: Marlon Becker, Frederick Altrock, Benjamin Risse

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate MSAM in detail and reveal insights on separable mechanisms of NAG, SAM and MSAM regarding training optimization and generalization. Code is available at https://github.com/Marlon Becker/MSAM. ... In Tab. 1, we show test accuracies for MSAM and related optimizers for Wide Res Net-28-10, Wide Res Net-16-4 (Zagoruyko and Komodakis, 2016) and Res Net50 (He et al., 2016) on CIFAR100 (Krizhevsky and Hinton, 2009) and vision transformers (Dosovitskiy et al., 2021) and Image Net-1k (Deng et al., 2009) next to the training speed.
Researcher Affiliation Academia Marlon Becker Frederick Altrock Benjamin Risse University of Muenster EMAIL
Pseudocode Yes Algorithm 1: SGD with Momentum-SAM (MSAM; efficient implementation) Input: training data S, momentum µ, learning rate η, perturbation strength ρ Initialize: weights e w0 random, momentum vector v0 0 for t 0 to T do sample batch Bt S LBt( e wt) = 1/|Bt| P (x,y) Bt l( e wt, x, y) g MSAM = LBt( e wt) // inc pert. wt = e wt + ρ vt ||vt|| // remove last pert. vt+1 = µvt + g MSAM // update momentum wt+1 = wt ηvt+1 // SGD step e wt+1 = wt+1 ρ vt+1 ||vt+1|| // next pert. end w T = e w T + ρ v T ||v T || // remove pert. return w T
Open Source Code Yes Code is available at https://github.com/Marlon Becker/MSAM.
Open Datasets Yes In Tab. 1, we show test accuracies for MSAM and related optimizers for Wide Res Net-28-10, Wide Res Net-16-4 (Zagoruyko and Komodakis, 2016) and Res Net50 (He et al., 2016) on CIFAR100 (Krizhevsky and Hinton, 2009) and vision transformers (Dosovitskiy et al., 2021) and Image Net-1k (Deng et al., 2009) next to the training speed.
Dataset Splits Yes For CIFAR100 trainings and normalized inputs to mean 0 and standard deviation 1. For Image Net trainings we used Inception-like preprocessing (Szegedy et al., 2015) with 224x224 resolution, normalized inputs to mean 0 and std 1 and clipped gradients L2-norms to 1.0.
Hardware Specification Yes Experiments were performed on up to 4 NVIDIA A100 GPUs.
Software Dependencies No The paper mentions optimizers like SGD and Adam and a learning rate scheduler, but does not provide specific version numbers for any software libraries or frameworks used (e.g., PyTorch, TensorFlow, CUDA).
Experiment Setup Yes Table A.3: Training Hyperparameters CIFAR100 Image Net Wide Res Nets Res Net50 Res Nets Vi Ts Base Optimizer SGD SGD SGD Adam W Epochs 200 200 100 90/300 Learning Rate 0.5 0.1 1 1e-3 LR-Scheduler cos cos cos cos + linear warm-up (8 epochs) Label Smoothing 0.1 0.1 0.1 Batch Size 256 256 1024 1024 Weight Decay 5e-4 1e-3 1e-4 0.1