reproducibilityindex.ai

Being Bayesian, Even Just a Bit, Fixes Overconfidence in ReLU Networks

Authors: Agustinus Kristiadi, Matthias Hein, Philipp Hennig

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We empirically validate our results through various Laplace approximations on common deep Re LU networks. Furthermore, while our theoretical analysis is focused on the binary classiﬁcation case, we also experimentally show that these Bayesian approaches yield good performance in the multiclass classiﬁcation setting, suggesting that our analysis may carry over to this case. 4. Experiments We corroborate our theoretical results via four experiments using various Gaussian-based Bayesian methods.
Researcher Affiliation	Academia	1University of Tübingen 2MPI for Intelligent Systems, Tübingen.
Pseudocode	No	The paper contains detailed mathematical derivations and descriptions of methods but does not include any explicitly labeled “Pseudocode” or “Algorithm” blocks.
Open Source Code	Yes	We mainly use a last-layer Laplace approximation (LLLA)3 where a Laplace approximation with an exact Hessian or its Kronecker factors is applied only to the last layer of a network (Appendix B). 3https://github.com/wiseodd/last_layer_laplace.
Open Datasets	Yes	Unless stated otherwise, we use Le Net (for MNIST) or Res Net-18 (for CIFAR-10, SVHN, CIFAR-100) architectures.
Dataset Splits	Yes	For each dataset that we use, we obtain a validation set via a random split from the respective test set.4 Lastly, all numbers reported in this section are averages along with their standard deviations over 10 trials. 4We use 50, 1000, and 2000 points for the toy, binary, and multi-class classiﬁcation cases, respectively.
Hardware Specification	No	The paper does not specify the exact hardware (e.g., CPU, GPU models, or memory) used for running the experiments.
Software Dependencies	No	The paper mentions software tools like “Py Calib (Wenger et al., 2020)” and “GPy Torch (Gardner et al., 2018)”, but it does not specify version numbers for these or any other software dependencies like programming languages or deep learning frameworks.
Experiment Setup	Yes	To obtain the optimal hyperparameter σ2 0, we follow (8) with λ set to 0.25. We use 20 posterior samples for both DLA and KFLA.