Dangers of Bayesian Model Averaging under Covariate Shift
Authors: Pavel Izmailov, Patrick Nicholson, Sanae Lotfi, Andrew G. Wilson
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate BNNs against two deterministic baselines: a MAP solution approximated with stochastic gradient descent (SGD) with momentum [Robbins and Monro, 1951, Polyak, 1964] and a deep ensemble of 10 independently trained MAP solutions [Lakshminarayanan et al., 2017]. For BNNs, we provide the results using a Gaussian prior and a more heavy-tailed Laplace prior following Fortuin et al. [2021]. We run all methods on the MNIST [Le Cun et al., 2010] and CIFAR-10 [Krizhevsky et al., 2014] datasets. |
| Researcher Affiliation | Collaboration | Pavel Izmailov NYU Patrick Nicholson Covera Health Sanae Lotfi NYU Andrew Gordon Wilson NYU |
| Pseudocode | No | The paper does not contain any sections or figures explicitly labeled as "Pseudocode" or "Algorithm," nor does it present any structured, code-like blocks for its methods. |
| Open Source Code | Yes | Our code is available here. |
| Open Datasets | Yes | We run all methods on the MNIST [Le Cun et al., 2010] and CIFAR-10 [Krizhevsky et al., 2014] datasets. |
| Dataset Splits | No | The paper mentions using MNIST and CIFAR-10 datasets and their test sets but does not explicitly provide details about a validation dataset split or how it was constructed or used for hyperparameter tuning. |
| Hardware Specification | Yes | Even on the small architectures that we consider, the experiments take multiple hours on 8 NVIDIA Tesla V-100 GPUs or 8-core TPU-V3 devices [Jouppi et al., 2020]. |
| Software Dependencies | No | The paper mentions algorithms and frameworks like "stochastic gradient descent (SGD)", "HMC", and "Re LU activations", but it does not specify any software dependencies with version numbers (e.g., "PyTorch 1.9", "TensorFlow 2.x"). |
| Experiment Setup | Yes | On both the CIFAR-10 and MNIST datasets we use a small convolutional network (CNN) inspired by Le Net-5 [Le Cun et al., 1998], with 2 convolutional layers followed by 3 fully-connected layers. On MNIST we additionally consider a fully-connected neural network (MLP) with 2 hidden layers of 256 neurons each. For all BNN models, we run a single chain of HMC for 100 iterations discarding the first 10 iterations as burn-in, following Izmailov et al. [2021]. In each case, we apply the Emp Cov prior to the first layer, and a Gaussian prior to all other layers. |