Deterministic Variational Inference for Robust Bayesian Neural Networks
Authors: Anqi Wu, Sebastian Nowozin, Edward Meeds, Richard E. Turner, José Miguel Hernández-Lobato, Alexander L. Gaunt
ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We implement deterministic variational inference (DVI) as described above to train small Re LU networks on UCI regression datasets (Dheeru & Karra Taniskidou, 2017). The experiments address the claims that our methods for eliminating gradient variance and automatic tuning of the prior improve the performance of the final trained model. |
| Researcher Affiliation | Collaboration | 1 Princeton Neuroscience Institute, Princeton University 2 Google AI Berlin 3 Department of Engineering, University of Cambridge 4 Microsoft Research, Cambridge |
| Pseudocode | No | No explicit pseudocode or algorithm blocks were found in the paper. |
| Open Source Code | Yes | Our implementation in Tensor Flow is available at https://github.com/Microsoft/deterministic-variational-inference |
| Open Datasets | Yes | We implement deterministic variational inference (DVI) as described above to train small Re LU networks on UCI regression datasets (Dheeru & Karra Taniskidou, 2017). |
| Dataset Splits | No | Each dataset is split into random training and test sets with 90% and 10% of the data respectively. This splitting process is repeated 20 times and the average test performance of each method at convergence is reported in table 2 |
| Hardware Specification | Yes | Figure 4 shows the time required to propagate activations through a single layer using the MCVI, DVI and d DVI methods on a Tesla V100 GPU. |
| Software Dependencies | No | Our implementation in Tensor Flow is available at https://github.com/Microsoft/deterministic-variational-inference (TensorFlow is mentioned, but no specific version number is provided.) |
| Experiment Setup | Yes | The same model is used for each inference method: a single hidden layer of 50 units for each dataset considered, extending this to 100 units in the special case of the larger protein structure dataset, prot. |