Tractable Function-Space Variational Inference in Bayesian Neural Networks

Authors: Tim G. J. Rudner, Zonghao Chen, Yee Whye Teh, Yarin Gal

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We perform a thorough empirical evaluation in which we compare the proposed approach to a wide array of competing methods and show that it consistently results in high predictive performance and reliable predictive uncertainty estimates, outperforming other methods in terms of predictive accuracy, robustness to distribution shifts, and uncertainty-based detection of distributionally-shifted data samples. We evaluate the proposed method on standard benchmarking datasets as well as on a safety-critical medical diagnosis task in which reliable uncertainty estimation is essential.2
Researcher Affiliation Academia Tim G. J. Rudner University of Oxford Zonghao Chen University College London Yee Whye Teh University of Oxford Yarin Gal University of Oxford
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks (e.g., a section explicitly labeled 'Algorithm' or 'Pseudocode').
Open Source Code Yes Our code can be accessed at https://github.com/timrudner/FSVI.
Open Datasets Yes We consider supervised learning tasks on data D = {(xn, yn)}N n=1 = (XD, y D) with inputs xn 2 X RD and targets yn 2 Y, where Y RQ for regression and Y {0, 1}Q for classification tasks. For models trained on the Fashion MNIST dataset, we use the MNIST and Not MNIST datasets as out-of-distribution evaluation points, while for models trained on the CIFAR-10 dataset, we use the SVHN dataset as out-of-distribution evaluation points. We use two publicly available datasets, Eye PACS [2015] and APTOS [2019], each containing RGB images of a human retina graded by a medical expert on the following scale: 0 (no DR), 1 (mild DR), 2 (moderate DR), 3 (severe DR), and 4 (proliferative DR).
Dataset Splits Yes The numbers of samples S and K are hyperparameters to be optimized with a validation set. For a details on models, training and validation procedures, and datasets used, see Appendix D.
Hardware Specification Yes Compute. All experiments were run on NVIDIA A100 GPUs provided by the Alan Turing Institute.
Software Dependencies Yes The code is written in Python (version 3.9) using PyTorch (version 1.10.0), and CUDA (version 11.3).
Experiment Setup Yes For further details about model architectures and training and evaluation protocols, see Appendix D.