Bayesian Deep Learning via Subnetwork Inference
Authors: Erik Daxberger, Eric Nalisnick, James U Allingham, Javier Antoran, Jose Miguel Hernandez-Lobato
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, our approach compares favorably to ensembles and less expressive posterior approximations over full networks. Our experiments demonstrate that expressive subnetwork inference can outperform popular Bayesian deep learning methods that do less expressive inference over the full NN as well as deep ensembles. |
| Researcher Affiliation | Collaboration | 1University of Cambridge 2Max Planck Institute for Intelligent Systems, Tübingen 3University of Amsterdam 4Microsoft Research 5The Alan Turing Institute. |
| Pseudocode | No | The paper describes the steps of its procedure (e.g., "Step #1: Point Estimation") in prose, but it does not include any formally structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain any statements about releasing source code or provide links to a code repository for the described methodology. |
| Open Datasets | Yes | We empirically evaluate our method on a range of benchmarks for uncertainty calibration and robustness to distribution shift. We consider three benchmark settings: 1) small-scale toy regression, 2) medium-scale tabular regression, and 3) image classification with Res Net-18. Further experimental results and setup details are presented in App. A and App. D, respectively. Rotated MNIST: Following (Ovadia et al., 2019; Antor an et al., 2020), we train all methods on MNIST and evaluate their predictive distributions on increasingly rotated digits. Corrupted CIFAR: Again following (Ovadia et al., 2019; Antor an et al., 2020), we train on CIFAR10 and evaluate on data subject to 16 different corruptions with 5 levels of intensity each (Hendrycks & Dietterich, 2019). We employ 3 tabular datasets of increasing size (input dimensionality, n. points): wine (11, 1439), kin8nm (8, 7373) and protein (9, 41157). We consider their standard train-test splits (Hern andez-Lobato & Adams, 2015) and their gap variants (Foong et al., 2019b). |
| Dataset Splits | Yes | We consider their standard train-test splits (Hern andez-Lobato & Adams, 2015) and their gap variants (Foong et al., 2019b)... For each split, we set aside 15% of the train data as a validation set. |
| Hardware Specification | No | The paper mentions "computational tractability on commercial hardware" but does not specify any particular CPU, GPU, or other hardware components used for experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers (e.g., Python 3.x, TensorFlow 2.x, PyTorch 1.x). |
| Experiment Setup | Yes | We use a Dropout probability of 0.1 and a prior precision of λ = 4 104 for diagonal Laplace, found via grid search. We use a prior precision of λ = 500, found via grid search. |