Bayesian Neural Network Priors Revisited
Authors: Vincent Fortuin, Adrià Garriga-Alonso, Sebastian W. Ober, Florian Wenzel, Gunnar Ratsch, Richard E Turner, Mark van der Wilk, Laurence Aitchison
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this work, we study empirically whether isotropic Gaussian priors are indeed suboptimal for BNNs and whether this can explain the cold posterior effect. We analyze the performance of different BNN priors for different network architectures and compare them to the empirical weight distributions of standard SGD-trained neural networks. |
| Researcher Affiliation | Collaboration | Vincent Fortuin ETH Zürich, Switzerland fortuin@inf.ethz.ch Adrià Garriga-Alonso University of Cambridge, United Kingdom ag919@cam.ac.uk Sebastian W. Ober University of Cambridge, United Kingdom swo25@cam.ac.uk Florian Wenzel Google AI Berlin, Germany florianwenzel@google.com Gunnar Rätsch ETH Zürich, Switzerland raetsch@inf.ethz.ch Richard E. Turner University of Cambridge, United Kingdom ret26@eng.cam.ac.uk Mark van der Wilk Imperial College London, United Kingdom m.vdwilk@imperial.ac.uk Laurence Aitchison University of Bristol, United Kingdom laurence.aitchison@bristol.ac.uk |
| Pseudocode | No | The paper does not contain any sections or blocks explicitly labeled "Pseudocode" or "Algorithm". |
| Open Source Code | Yes | We make our library available on Github1, inviting other researchers to join us in studying the role of priors in BNNs using state-of-the-art inference. 1https://github.com/ratschlab/bnn_priors. MIT licensed. |
| Open Datasets | Yes | We trained an FCNN (Fig. 1, top) and a CNN (Fig. 1, middle) on MNIST (Le Cun et al., 1998). ... Next, we did a similar analysis for a Res Net20 trained on CIFAR-10 (Krizhevsky, 2009)... |
| Dataset Splits | No | The paper mentions using train and test sets but does not provide specific percentages or counts for training/validation/test splits, nor does it explicitly mention using a separate validation set. |
| Hardware Specification | Yes | We ran the experiments on GPUs of the type NVIDIA Ge Force GTX 1080 Ti and NVIDIA Ge Force RTX 2080 Ti on our local cluster. |
| Software Dependencies | No | We implemented the inference and models with the Py Torch library (Paszke et al., 2019). To manage our experiments and schedule runs with several settings, we used Sacred (Greff et al., 2017) and Jug (Coelho, 2017) respectively. For the diagnostics, we also use Arviz (Kumar et al., 2019). (No specific version numbers for these libraries are provided). |
| Experiment Setup | Yes | For all the MNIST BNN experiments, we perform 60 cycles of SG-MCMC (Zhang et al., 2019) with 45 epochs each. We draw one sample each at the end of the respective last five epochs of each cycle, thus yielding 300 samples after 2,700 epochs, out of which we discarded the first 50 samples as a burn-in. ... We start each cycle with a learning rate of 0.01 and decay to 0 using a cosine schedule. We use a mini-batch size of 128. |