reproducibilityindex.ai

Tractable Function-Space Variational Inference in Bayesian Neural Networks

Authors: Tim G. J. Rudner, Zonghao Chen, Yee Whye Teh, Yarin Gal

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We perform a thorough empirical evaluation in which we compare the proposed approach to a wide array of competing methods and show that it consistently results in high predictive performance and reliable predictive uncertainty estimates, outperforming other methods in terms of predictive accuracy, robustness to distribution shifts, and uncertainty-based detection of distributionally-shifted data samples. We evaluate the proposed method on standard benchmarking datasets as well as on a safety-critical medical diagnosis task in which reliable uncertainty estimation is essential.2
Researcher Affiliation	Academia	Tim G. J. Rudner University of Oxford Zonghao Chen University College London Yee Whye Teh University of Oxford Yarin Gal University of Oxford
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks (e.g., a section explicitly labeled 'Algorithm' or 'Pseudocode').
Open Source Code	Yes	Our code can be accessed at https://github.com/timrudner/FSVI.
Open Datasets	Yes	We consider supervised learning tasks on data D = {(xn, yn)}N n=1 = (XD, y D) with inputs xn 2 X RD and targets yn 2 Y, where Y RQ for regression and Y {0, 1}Q for classiﬁcation tasks. For models trained on the Fashion MNIST dataset, we use the MNIST and Not MNIST datasets as out-of-distribution evaluation points, while for models trained on the CIFAR-10 dataset, we use the SVHN dataset as out-of-distribution evaluation points. We use two publicly available datasets, Eye PACS [2015] and APTOS [2019], each containing RGB images of a human retina graded by a medical expert on the following scale: 0 (no DR), 1 (mild DR), 2 (moderate DR), 3 (severe DR), and 4 (proliferative DR).
Dataset Splits	Yes	The numbers of samples S and K are hyperparameters to be optimized with a validation set. For a details on models, training and validation procedures, and datasets used, see Appendix D.
Hardware Specification	Yes	Compute. All experiments were run on NVIDIA A100 GPUs provided by the Alan Turing Institute.
Software Dependencies	Yes	The code is written in Python (version 3.9) using PyTorch (version 1.10.0), and CUDA (version 11.3).
Experiment Setup	Yes	For further details about model architectures and training and evaluation protocols, see Appendix D.