Dangers of Bayesian Model Averaging under Covariate Shift

Authors: Pavel Izmailov, Patrick Nicholson, Sanae Lotfi, Andrew G. Wilson

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate BNNs against two deterministic baselines: a MAP solution approximated with stochastic gradient descent (SGD) with momentum [Robbins and Monro, 1951, Polyak, 1964] and a deep ensemble of 10 independently trained MAP solutions [Lakshminarayanan et al., 2017]. For BNNs, we provide the results using a Gaussian prior and a more heavy-tailed Laplace prior following Fortuin et al. [2021]. We run all methods on the MNIST [Le Cun et al., 2010] and CIFAR-10 [Krizhevsky et al., 2014] datasets.
Researcher Affiliation Collaboration Pavel Izmailov NYU Patrick Nicholson Covera Health Sanae Lotfi NYU Andrew Gordon Wilson NYU
Pseudocode No The paper does not contain any sections or figures explicitly labeled as "Pseudocode" or "Algorithm," nor does it present any structured, code-like blocks for its methods.
Open Source Code Yes Our code is available here.
Open Datasets Yes We run all methods on the MNIST [Le Cun et al., 2010] and CIFAR-10 [Krizhevsky et al., 2014] datasets.
Dataset Splits No The paper mentions using MNIST and CIFAR-10 datasets and their test sets but does not explicitly provide details about a validation dataset split or how it was constructed or used for hyperparameter tuning.
Hardware Specification Yes Even on the small architectures that we consider, the experiments take multiple hours on 8 NVIDIA Tesla V-100 GPUs or 8-core TPU-V3 devices [Jouppi et al., 2020].
Software Dependencies No The paper mentions algorithms and frameworks like "stochastic gradient descent (SGD)", "HMC", and "Re LU activations", but it does not specify any software dependencies with version numbers (e.g., "PyTorch 1.9", "TensorFlow 2.x").
Experiment Setup Yes On both the CIFAR-10 and MNIST datasets we use a small convolutional network (CNN) inspired by Le Net-5 [Le Cun et al., 1998], with 2 convolutional layers followed by 3 fully-connected layers. On MNIST we additionally consider a fully-connected neural network (MLP) with 2 hidden layers of 256 neurons each. For all BNN models, we run a single chain of HMC for 100 iterations discarding the first 10 iterations as burn-in, following Izmailov et al. [2021]. In each case, we apply the Emp Cov prior to the first layer, and a Gaussian prior to all other layers.