reproducibilityindex.ai

Sharpness-Aware Minimization Leads to Low-Rank Features

Authors: Maksym Andriushchenko, Dara Bahri, Hossein Mobahi, Nicolas Flammarion

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In Section 3, we present extensive empirical evidence of low-rank features for various models (Res Nets, Vi Ts, MLP-Mixers) trained with SAM on four classification tasks (CIFAR-10/100, Tiny Image Net, Image Net-1k) as well as for contrastive text-image training (MS-COCO).
Researcher Affiliation	Collaboration	Maksym Andriushchenko EPFL maksym.andriushchenko@epfl.ch Dara Bahri Google Research dbahri@google.com Hossein Mobahi Google Research hmobahi@google.com Nicolas Flammarion EPFL nicolas.flammarion@epfl.ch
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	Yes	We make our code available at https://github.com/tml-epfl/sam-low-rank-features.
Open Datasets	Yes	We train a Pre Act Res Net-18 (He et al., 2016b) with standard augmentations on standard deep learning datasets: CIFAR-10, CIFAR-100 (Krizhevsky and Hinton, 2009), Tiny Image Net (Le and Yang, 2015), and Image Net-1k (Deng et al., 2009), and then contrastive learning on MS-COCO (Lin et al., 2014).
Dataset Splits	No	The paper mentions evaluating on training examples and implies test sets, but does not provide explicit training/validation/test dataset splits (e.g., specific percentages, sample counts, or explicit mention of validation set usage) for all experiments.
Hardware Specification	Yes	We performed all experiments on a single Nvidia A100 GPU where we used an internal cluster for all experiments except experiments on MS-COCO, for which we used a cloud provider.
Software Dependencies	No	The paper mentions using Adam (Kingma and Ba, 2014) but does not provide specific version numbers for other key software components like deep learning frameworks (e.g., PyTorch, TensorFlow), Python, or CUDA.
Experiment Setup	Yes	We train these models with batch size 256 for 200 epochs using standard augmentations (random crops and random mirroring). For the minimal setting, we use plain SGD with the learning rate 0.05. For the state-of-the-art setting, we use SGD with the learning rate 0.1 (decayed by a factor of 10 after 50% and 90% epochs), momentum parameter 0.9, weight decay 0.0005... We use Adam (Kingma and Ba, 2014) with learning rate 0.0001 which is decayed down to 0 using a cosine decay schedule. We train these models with batch size 128 for 25 epochs without data augmentations.