SAM as an Optimal Relaxation of Bayes

Authors: Thomas Möllenhoff, Mohammad Emtiyaz Khan

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we present numerical results and show that b SAM brings the best of the two worlds together. It gives an improved uncertainty estimate similarly to the best Bayesian approaches, but also improves test accuracy, just like SAM. We compare performance to many methods from the deep learning (DL) and Bayesian DL literature: SGD, Adam (Kingma & Ba, 2015), SWAG (Maddox et al., 2019), and VOGN (Osawa et al., 2019). We also compare with two SAM variants: SAM-SGD and SAM-Adam. Both are obtained by inserting the perturbed gradients into either SGD or Adam, as suggested in Foret et al. (2021); Bahri et al. (2022); Kwon et al. (2021). We compare these methods across three different neural network architectures and on five datasets of increasing complexity. The comparison is carried out with respect to four different metrics evaluated on the validation set. The metrics are test accuracy, the negative log-likelihood (NLL), expected calibration error (ECE) (Guo et al., 2017) (using 20 bins) and area-under the ROC curves (AUROC).
Researcher Affiliation Academia Thomas M ollenhoff & Mohammad Emtiyaz Khan RIKEN Center for Advanced Intelligence Project Tokyo, Japan {thomas.moellenhoff,emtiyaz.khan}@riken.jp
Pseudocode Yes Algorithm 1 Our Bayesian-SAM (b SAM) is a simple modification of SAM with Adam .
Open Source Code No The paper does not provide a direct statement or link for the release of its source code.
Open Datasets Yes We compare these methods across three different neural network architectures and on five datasets of increasing complexity. (Referring to CIFAR-10, CIFAR-100, Tiny Image Net, MNIST, FMNIST in tables and text).
Dataset Splits No The paper mentions evaluating metrics on the 'validation set' and selecting hyperparameters based on 'best validation accuracy', but it does not provide specific details on the dataset splits (percentages, counts, or splitting methodology) for reproduction.
Hardware Specification No The paper details experimental setups and hyperparameters in Appendix G but does not provide any specific hardware specifications such as GPU or CPU models.
Software Dependencies No The paper mentions various methods (SGD, Adam, SWAG, VOGN) and architectures (MLP, Le Net 5, Res Net-20) but does not provide specific version numbers for any software dependencies or libraries used in the implementation.
Experiment Setup Yes We summarize all hyperparameters in Table 7. For all experiments, the hyperparameters are selected using a grid-search over a moderate amount of configurations to find the best validation accuracy. Our neural network outputs the natural param-eters of the categorical distribution as a minimal exponential family (number of classes minus one output neurons). The loss function is the negative log-likelihood. We always use a batch-size of B = 128. For SAM-SGD, SAM-Adam and b SAM we split each minibatch into m = 8 subbatches, for VOGN we set m = 16 and consider independently computed perturbations for each subbatch.