Posterior Refinement Improves Sample Efficiency in Bayesian Neural Networks

Authors: Agustinus Kristiadi, Runa Eschenhagen, Philipp Hennig

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this work, we experimentally show that the key to good MC-approximated predictive distributions is the quality of the approximate posterior itself. ... We validate the method via extensive experiments and show that refined posteriors are competitive with the much more expensive full-batch Hamiltonian Monte Carlo.
Researcher Affiliation Academia Agustinus Kristiadi University of Tübingen agustinus.kristiadi@uni-tuebingen.de Runa Eschenhagen University of Tübingen runa.eschenhagen@uni-tuebingen.de Philipp Hennig University of Tübingen and MPI for Intelligent Systems, Tübingen philipp.hennig@uni-tuebingen.de
Pseudocode No The paper describes the proposed method but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks or figures.
Open Source Code Yes Code available at: https://github.com/runame/laplace-refinement.
Open Datasets Yes Datasets We validate our method using standard classification datasets: Fashion-MNIST (FMNIST), CIFAR-10, and CIFAR-100.
Dataset Splits Yes These prior precisions are obtained via grid search on the respective HMC baseline, maximizing validation log-likelihood. ... For each in-distribution dataset, we randomly pick 5,000 samples for validation.
Hardware Specification Yes Using a standard consumer GPU (Nvidia RTX 2080Ti), each epoch a length-5 NF s optimization takes around 3.4 seconds.
Software Dependencies No The paper mentions software like Pyro [39] and second-order optimization libraries [23, 24], but does not provide specific version numbers for any of these software dependencies.
Experiment Setup Yes We use prior precisions of 5·10 and 40 for the last-layer F-MNIST and CIFAR experiments, respectively. ... For all methods, we use MC integration with 20 samples to obtain the predictive distribution, except for HMC and CSGHMC where we use S = 600 and S = 12, respectively. ... More implementation details are in Appendix B.