Collapsed Variational Bounds for Bayesian Neural Networks

Authors: Marcin Tomczak, Siddharth Swaroop, Andrew Foong, Richard Turner

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments show that the new bounds significantly improve the performance of Gaussian mean-field VI applied to BNNs on a variety of data sets, demonstrating that mean-field VI works well even in deep models. In this section we explore the predictive performance of the introduced variational bounds and benchmark them together with other algorithms.
Researcher Affiliation Academia University of Cambridge Cambridge, UK {mbt27,ss2163,ykf21,ret26}@cam.ac.uk
Pseudocode Yes Algorithm 1 Deriving a collapsed bound L for BNN in four steps.
Open Source Code Yes We provide the code implementing the introduced algorithms at https://github.com/marctom/collapsed_bnns.
Open Datasets Yes We first consider 20 train-test splits for 8 UCI regression data sets [14]. Classifying vectorized MNIST images...is a standard benchmark for BNNs [7]. test it on the fashion MNIST data set [73] Le Net architecture [44] on 6 data sets: MNIST, fashion MNIST, K-MNIST [10], CIFAR10, CIFAR100 [41] and SVHN [56]. large CNNs: Res Net18 [25], Shuffle Net[48] and Alex Net[42]. We use CIFAR10, CIFAR100, STL10 [11] and SVHN.
Dataset Splits No We first consider 20 train-test splits for 8 UCI regression data sets [14].
Hardware Specification No The paper does not provide specific hardware details such as GPU models, CPU types, or memory configurations used for the experiments.
Software Dependencies No We optimize the objectives for 200K steps with the ADAM optimizer [37] with default settings.
Experiment Setup Yes We learn 2 hidden layer BNNs (results for 1 hidden layer in Appendix D) with 50 units and Re LU activations [27, 21, 67], but we use a heteroscedastic observation model p(y|f 1W (x)) = N(y|f 1W (x), exp(f 2W (x)))... We optimize the objectives for 200K steps with the ADAM optimizer [37] with default settings. We optimize the objectives for 800 epochs (except MAP for 50 epochs and MC dropout for 100 epochs as they tend to overfit) using batch size 512 and ADAM optimizer with default parameters.