Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Sharpness-Aware Minimization Enhances Feature Quality via Balanced Learning
Authors: Jacob Mitchell Springer, Vaishnavh Nagarajan, Aditi Raghunathan
ICLR 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our insights are supported by experiments on real data: we demonstrate that SAM improves the quality of features in datasets containing redundant or spurious features, including Celeb A, Waterbirds, CIFAR-MNIST, and Domain Bed. |
| Researcher Affiliation | Collaboration | 1Carnegie Mellon University 2Google Research EMAIL1 vaishnavh@google.com2 |
| Pseudocode | Yes | The architecture is defined by the following pseudo-Py Torch: t o r c h . nn . S e q u e n t i a l ( t o r c h . nn . Conv2d (3 , 32 , k e r n e l s i z e =5 , s t r i d e =2 , padding =2) , t o r c h . nn . Re LU( i n p l a c e =True ) , t o r c h . nn . Conv2d (32 , 64 , k e r n e l s i z e =3 , s t r i d e =2 , padding =1) , t o r c h . nn . Re LU( i n p l a c e =True ) , t o r c h . nn . Conv2d (64 , 128 , k e r n e l s i z e =3 , s t r i d e =2 , padding =1) , t o r c h . nn . Re LU( i n p l a c e =True ) , t o r c h . nn . F l a t t e n ( ) , t o r c h . nn . Linear ( n f e a t u r e s , num classes ) ) |
| Open Source Code | No | The paper does not provide any explicit statement about making its source code available or a link to a code repository. |
| Open Datasets | Yes | Datasets. We use four datasets in our experiments each annotated by two features: Celeb A (Liu et al., 2015), Waterbirds (Sagawa et al., 2019), CIFAR-MNIST (binary) (Shah et al., 2020), and FMNIST-MNIST (5-class) (Kirichenko et al., 2022). |
| Dataset Splits | Yes | For all datasets, we use the standard train/validation/test split, and when a validation set is not provided, we use a random 90/10 split of the training set. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used to run the experiments. |
| Software Dependencies | No | The paper mentions โpseudo-Py Torchโ for describing architectures and data augmentations but does not provide specific version numbers for PyTorch or any other software dependencies. |
| Experiment Setup | Yes | Parameters and sweeps. For the toy experiments, we choose a constant learning rate of 0.01, a batch size of 5, 300 training points, no momentum, and no weight decay. For the CIFAR-MNIST and FMNIST-MNIST experiments, we sweep over the learning rates {0.01, 0.05, 0.1} and the phantom hyperparameter ฯ over {0.0, 0.01, 0.03, 0.05, 0.07, 0.1, 0.2}. We use a batch size of 100, a cosine learning rate schedule, a momentum parameter of 0.9, and no weight decay. We normalize the images by the mean pixel value. Otherwise, we do not use data augmentation. For the Celeb A and Waterbirds experiments, we sweep over the learning rates {0.0005, 0.001, 0.005, 0.01} and the ฯ parameter {0.0, 0.01, 0.02, 0.05, 0.07}. We use a batch size of 128, a cosine learning rate schedule, a momentum parameter of 0.9, and a weight decay of 10 4. |