reproducibilityindex.ai

Sharpness-Aware Minimization Enhances Feature Quality via Balanced Learning

Authors: Jacob Mitchell Springer, Vaishnavh Nagarajan, Aditi Raghunathan

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our insights are supported by experiments on real data: we demonstrate that SAM improves the quality of features in datasets containing redundant or spurious features, including Celeb A, Waterbirds, CIFAR-MNIST, and Domain Bed.
Researcher Affiliation	Collaboration	1Carnegie Mellon University 2Google Research {jspringer,raditi}@cmu.edu1 vaishnavh@google.com2
Pseudocode	Yes	The architecture is defined by the following pseudo-Py Torch: t o r c h . nn . S e q u e n t i a l ( t o r c h . nn . Conv2d (3 , 32 , k e r n e l s i z e =5 , s t r i d e =2 , padding =2) , t o r c h . nn . Re LU( i n p l a c e =True ) , t o r c h . nn . Conv2d (32 , 64 , k e r n e l s i z e =3 , s t r i d e =2 , padding =1) , t o r c h . nn . Re LU( i n p l a c e =True ) , t o r c h . nn . Conv2d (64 , 128 , k e r n e l s i z e =3 , s t r i d e =2 , padding =1) , t o r c h . nn . Re LU( i n p l a c e =True ) , t o r c h . nn . F l a t t e n ( ) , t o r c h . nn . Linear ( n f e a t u r e s , num classes ) )
Open Source Code	No	The paper does not provide any explicit statement about making its source code available or a link to a code repository.
Open Datasets	Yes	Datasets. We use four datasets in our experiments each annotated by two features: Celeb A (Liu et al., 2015), Waterbirds (Sagawa et al., 2019), CIFAR-MNIST (binary) (Shah et al., 2020), and FMNIST-MNIST (5-class) (Kirichenko et al., 2022).
Dataset Splits	Yes	For all datasets, we use the standard train/validation/test split, and when a validation set is not provided, we use a random 90/10 split of the training set.
Hardware Specification	No	The paper does not provide specific details about the hardware used to run the experiments.
Software Dependencies	No	The paper mentions “pseudo-Py Torch” for describing architectures and data augmentations but does not provide specific version numbers for PyTorch or any other software dependencies.
Experiment Setup	Yes	Parameters and sweeps. For the toy experiments, we choose a constant learning rate of 0.01, a batch size of 5, 300 training points, no momentum, and no weight decay. For the CIFAR-MNIST and FMNIST-MNIST experiments, we sweep over the learning rates {0.01, 0.05, 0.1} and the phantom hyperparameter ρ over {0.0, 0.01, 0.03, 0.05, 0.07, 0.1, 0.2}. We use a batch size of 100, a cosine learning rate schedule, a momentum parameter of 0.9, and no weight decay. We normalize the images by the mean pixel value. Otherwise, we do not use data augmentation. For the Celeb A and Waterbirds experiments, we sweep over the learning rates {0.0005, 0.001, 0.005, 0.01} and the ρ parameter {0.0, 0.01, 0.02, 0.05, 0.07}. We use a batch size of 128, a cosine learning rate schedule, a momentum parameter of 0.9, and a weight decay of 10 4.