Tractable Function-Space Variational Inference in Bayesian Neural Networks
Authors: Tim G. J. Rudner, Zonghao Chen, Yee Whye Teh, Yarin Gal
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We perform a thorough empirical evaluation in which we compare the proposed approach to a wide array of competing methods and show that it consistently results in high predictive performance and reliable predictive uncertainty estimates, outperforming other methods in terms of predictive accuracy, robustness to distribution shifts, and uncertainty-based detection of distributionally-shifted data samples. We evaluate the proposed method on standard benchmarking datasets as well as on a safety-critical medical diagnosis task in which reliable uncertainty estimation is essential.2 |
| Researcher Affiliation | Academia | Tim G. J. Rudner University of Oxford Zonghao Chen University College London Yee Whye Teh University of Oxford Yarin Gal University of Oxford |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks (e.g., a section explicitly labeled 'Algorithm' or 'Pseudocode'). |
| Open Source Code | Yes | Our code can be accessed at https://github.com/timrudner/FSVI. |
| Open Datasets | Yes | We consider supervised learning tasks on data D = {(xn, yn)}N n=1 = (XD, y D) with inputs xn 2 X RD and targets yn 2 Y, where Y RQ for regression and Y {0, 1}Q for classification tasks. For models trained on the Fashion MNIST dataset, we use the MNIST and Not MNIST datasets as out-of-distribution evaluation points, while for models trained on the CIFAR-10 dataset, we use the SVHN dataset as out-of-distribution evaluation points. We use two publicly available datasets, Eye PACS [2015] and APTOS [2019], each containing RGB images of a human retina graded by a medical expert on the following scale: 0 (no DR), 1 (mild DR), 2 (moderate DR), 3 (severe DR), and 4 (proliferative DR). |
| Dataset Splits | Yes | The numbers of samples S and K are hyperparameters to be optimized with a validation set. For a details on models, training and validation procedures, and datasets used, see Appendix D. |
| Hardware Specification | Yes | Compute. All experiments were run on NVIDIA A100 GPUs provided by the Alan Turing Institute. |
| Software Dependencies | Yes | The code is written in Python (version 3.9) using PyTorch (version 1.10.0), and CUDA (version 11.3). |
| Experiment Setup | Yes | For further details about model architectures and training and evaluation protocols, see Appendix D. |