Bayesian Deep Learning via Subnetwork Inference

Authors: Erik Daxberger, Eric Nalisnick, James U Allingham, Javier Antoran, Jose Miguel Hernandez-Lobato

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically, our approach compares favorably to ensembles and less expressive posterior approximations over full networks. Our experiments demonstrate that expressive subnetwork inference can outperform popular Bayesian deep learning methods that do less expressive inference over the full NN as well as deep ensembles.
Researcher Affiliation Collaboration 1University of Cambridge 2Max Planck Institute for Intelligent Systems, Tübingen 3University of Amsterdam 4Microsoft Research 5The Alan Turing Institute.
Pseudocode No The paper describes the steps of its procedure (e.g., "Step #1: Point Estimation") in prose, but it does not include any formally structured pseudocode or algorithm blocks.
Open Source Code No The paper does not contain any statements about releasing source code or provide links to a code repository for the described methodology.
Open Datasets Yes We empirically evaluate our method on a range of benchmarks for uncertainty calibration and robustness to distribution shift. We consider three benchmark settings: 1) small-scale toy regression, 2) medium-scale tabular regression, and 3) image classification with Res Net-18. Further experimental results and setup details are presented in App. A and App. D, respectively. Rotated MNIST: Following (Ovadia et al., 2019; Antor an et al., 2020), we train all methods on MNIST and evaluate their predictive distributions on increasingly rotated digits. Corrupted CIFAR: Again following (Ovadia et al., 2019; Antor an et al., 2020), we train on CIFAR10 and evaluate on data subject to 16 different corruptions with 5 levels of intensity each (Hendrycks & Dietterich, 2019). We employ 3 tabular datasets of increasing size (input dimensionality, n. points): wine (11, 1439), kin8nm (8, 7373) and protein (9, 41157). We consider their standard train-test splits (Hern andez-Lobato & Adams, 2015) and their gap variants (Foong et al., 2019b).
Dataset Splits Yes We consider their standard train-test splits (Hern andez-Lobato & Adams, 2015) and their gap variants (Foong et al., 2019b)... For each split, we set aside 15% of the train data as a validation set.
Hardware Specification No The paper mentions "computational tractability on commercial hardware" but does not specify any particular CPU, GPU, or other hardware components used for experiments.
Software Dependencies No The paper does not provide specific software dependencies with version numbers (e.g., Python 3.x, TensorFlow 2.x, PyTorch 1.x).
Experiment Setup Yes We use a Dropout probability of 0.1 and a prior precision of λ = 4 104 for diagonal Laplace, found via grid search. We use a prior precision of λ = 500, found via grid search.