Being Bayesian, Even Just a Bit, Fixes Overconfidence in ReLU Networks

Authors: Agustinus Kristiadi, Matthias Hein, Philipp Hennig

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically validate our results through various Laplace approximations on common deep Re LU networks. Furthermore, while our theoretical analysis is focused on the binary classification case, we also experimentally show that these Bayesian approaches yield good performance in the multiclass classification setting, suggesting that our analysis may carry over to this case. 4. Experiments We corroborate our theoretical results via four experiments using various Gaussian-based Bayesian methods.
Researcher Affiliation Academia 1University of Tübingen 2MPI for Intelligent Systems, Tübingen.
Pseudocode No The paper contains detailed mathematical derivations and descriptions of methods but does not include any explicitly labeled “Pseudocode” or “Algorithm” blocks.
Open Source Code Yes We mainly use a last-layer Laplace approximation (LLLA)3 where a Laplace approximation with an exact Hessian or its Kronecker factors is applied only to the last layer of a network (Appendix B). 3https://github.com/wiseodd/last_layer_laplace.
Open Datasets Yes Unless stated otherwise, we use Le Net (for MNIST) or Res Net-18 (for CIFAR-10, SVHN, CIFAR-100) architectures.
Dataset Splits Yes For each dataset that we use, we obtain a validation set via a random split from the respective test set.4 Lastly, all numbers reported in this section are averages along with their standard deviations over 10 trials. 4We use 50, 1000, and 2000 points for the toy, binary, and multi-class classification cases, respectively.
Hardware Specification No The paper does not specify the exact hardware (e.g., CPU, GPU models, or memory) used for running the experiments.
Software Dependencies No The paper mentions software tools like “Py Calib (Wenger et al., 2020)” and “GPy Torch (Gardner et al., 2018)”, but it does not specify version numbers for these or any other software dependencies like programming languages or deep learning frameworks.
Experiment Setup Yes To obtain the optimal hyperparameter σ2 0, we follow (8) with λ set to 0.25. We use 20 posterior samples for both DLA and KFLA.