reproducibilityindex.ai

SAM as an Optimal Relaxation of Bayes

Authors: Thomas Möllenhoff, Mohammad Emtiyaz Khan

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we present numerical results and show that b SAM brings the best of the two worlds together. It gives an improved uncertainty estimate similarly to the best Bayesian approaches, but also improves test accuracy, just like SAM. We compare performance to many methods from the deep learning (DL) and Bayesian DL literature: SGD, Adam (Kingma & Ba, 2015), SWAG (Maddox et al., 2019), and VOGN (Osawa et al., 2019). We also compare with two SAM variants: SAM-SGD and SAM-Adam. Both are obtained by inserting the perturbed gradients into either SGD or Adam, as suggested in Foret et al. (2021); Bahri et al. (2022); Kwon et al. (2021). We compare these methods across three different neural network architectures and on ﬁve datasets of increasing complexity. The comparison is carried out with respect to four different metrics evaluated on the validation set. The metrics are test accuracy, the negative log-likelihood (NLL), expected calibration error (ECE) (Guo et al., 2017) (using 20 bins) and area-under the ROC curves (AUROC).
Researcher Affiliation	Academia	Thomas M ollenhoff & Mohammad Emtiyaz Khan RIKEN Center for Advanced Intelligence Project Tokyo, Japan {thomas.moellenhoff,emtiyaz.khan}@riken.jp
Pseudocode	Yes	Algorithm 1 Our Bayesian-SAM (b SAM) is a simple modiﬁcation of SAM with Adam .
Open Source Code	No	The paper does not provide a direct statement or link for the release of its source code.
Open Datasets	Yes	We compare these methods across three different neural network architectures and on ﬁve datasets of increasing complexity. (Referring to CIFAR-10, CIFAR-100, Tiny Image Net, MNIST, FMNIST in tables and text).
Dataset Splits	No	The paper mentions evaluating metrics on the 'validation set' and selecting hyperparameters based on 'best validation accuracy', but it does not provide specific details on the dataset splits (percentages, counts, or splitting methodology) for reproduction.
Hardware Specification	No	The paper details experimental setups and hyperparameters in Appendix G but does not provide any specific hardware specifications such as GPU or CPU models.
Software Dependencies	No	The paper mentions various methods (SGD, Adam, SWAG, VOGN) and architectures (MLP, Le Net 5, Res Net-20) but does not provide specific version numbers for any software dependencies or libraries used in the implementation.
Experiment Setup	Yes	We summarize all hyperparameters in Table 7. For all experiments, the hyperparameters are selected using a grid-search over a moderate amount of conﬁgurations to ﬁnd the best validation accuracy. Our neural network outputs the natural param-eters of the categorical distribution as a minimal exponential family (number of classes minus one output neurons). The loss function is the negative log-likelihood. We always use a batch-size of B = 128. For SAM-SGD, SAM-Adam and b SAM we split each minibatch into m = 8 subbatches, for VOGN we set m = 16 and consider independently computed perturbations for each subbatch.