On the Expressiveness of Approximate Inference in Bayesian Neural Networks
Authors: Andrew Foong, David Burt, Yingzhen Li, Richard Turner
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our major findings are: 1. For shallow BNNs, there exist simple situations where no mean-field Gaussian or MC dropout distribution can faithfully represent the exact posterior predictive uncertainty (Criterion 1 is not satisfied). We prove in section 3 that in these instances the variance function of any fully-connected, single-hidden layer Re LU BNN using these families suffers a lack of in-between uncertainty : increased uncertainty in between well-separated regions of low uncertainty. This is especially problematic for lower-dimensional data where we may expect some datapoints to be in between others. Examples include spatio-temporal data, or Bayesian optimisation for hyperparameter search, where we frequently wish to make predictions in unobserved regions in between observed regions. We verify that the exact posterior predictive does not have this limitation; hence this pathology is attributable solely to the restrictiveness of the approximating family. 2. In section 4 we prove a universal approximation result showing that the mean and variance functions of deep approximate BNNs using mean-field Gaussian or MCDO distributions can uniformly approximate any continuous function and any continuous non-negative function respectively. However, it remains to be shown that appropriate predictive means and variances will be found when optimising the ELBO. To test this, we focus on the low-dimensional, small data regime where comparisons to references for the exact posterior such as the limiting GP [30, 22, 28] are easier to make. In section 4.2 we provide empirical evidence that in spite of its theoretical flexibility, VI in deep BNNs can still lead to distributions that suffer from similar pathologies to the shallow case, i.e. Criterion 2 is not satisfied. In section 5, we provide an active learning case study on a real-world dataset showing how in-between uncertainty can be a crucial feature of the posterior predictive. In this case, we provide evidence that although the inductive biases of the BNN model with exact inference can bring considerable benefits, these are lost when MFVI or MCDO are used. Code to reproduce our experiments can be found at https://github.com/cambridge-mlg/expressiveness-approx-bnns. |
| Researcher Affiliation | Collaboration | Andrew Y. K. Foong University of Cambridge ykf21@cam.ac.uk David R. Burt University of Cambridge drb62@cam.ac.uk Yingzhen Li Microsoft Research Yingzhen.Li@microsoft.com Richard E. Turner University of Cambridge Microsoft Research ret26@cam.ac.uk |
| Pseudocode | No | The paper describes algorithms and methods but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code to reproduce our experiments can be found at https://github.com/cambridge-mlg/expressiveness-approx-bnns. |
| Open Datasets | Yes | We now consider the impact of the pathologies described in sections 3 and 4 on active learning [36] on a real-world dataset, where the task is to use uncertainty information to intelligently select which points to label. ... we specifically analyse a dataset where we have observed active learning with approximate BNNs to fail the Naval regression dataset [6], which is 14-dimensional and consists of 11,934 datapoints. ... [6] Andrea Coraddu, Luca Oneto, Alessandro Ghio, Stefano Savio, Davide Anguita, and Massimo Figari. Machine learning approaches for improving condition-based maintenance of naval propulsion plants. Journal of Engineering for the Maritime Environment, 2014. |
| Dataset Splits | No | The paper mentions training on an 'active set' and evaluating on a 'held-out test set', but it does not specify a separate validation split or the methodology for creating such splits (e.g., exact percentages or counts). |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for the experiments (e.g., CPU, GPU models, memory, or cloud instances). |
| Software Dependencies | No | The paper does not list specific software dependencies with their version numbers (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | We train MFVI and MCDO for 20,000 iterations of ADAM at each step of active learning. |