Bayesian Uncertainty Estimation for Batch Normalized Deep Networks
Authors: Mattias Teye, Hossein Azizpour, Kevin Smith
ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our approach is thoroughly validated by measuring the quality of uncertainty in a series of empirical experiments on different tasks. It outperforms baselines with strong statistical significance, and displays competitive performance with recent Bayesian approaches. |
| Researcher Affiliation | Collaboration | 1School of Electrical Engineering and Computer Science, KTH Royal Institute of Technology, Stockholm, Sweden 2Current address: Electronic Arts, SEED, Stockholm, Sweden. This work was carried out at Budbee AB. 3Science for Life Laboratory. |
| Pseudocode | Yes | Algorithm 1 MCBN Algorithm |
| Open Source Code | Yes | Code for reproducing our experiments is available at https://github.com/icml-mcbn/mcbn. |
| Open Datasets | Yes | Our quantitative analysis relies on CIFAR10 for image classification and eight standard regression datasets, listed in Appendix Table 1. Publicly available from the UCI Machine Learning Repository (University of California, 2017) and Delve (Ghahramani, 1996) |
| Dataset Splits | Yes | Results were averaged over five random splits of 20% test and 80% training and cross-validation (CV) data. For each split, 5-fold CV by grid search with a RMSE minimization objective was used to find training hyperparameters and optimal n.o. epochs, out of a maximum of 2000. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for experiments. It mentions 'Implementation was done in Tensor Flow', implying computation, but lacks hardware specifics. |
| Software Dependencies | No | The paper mentions 'Tensor Flow' and 'Adam optimizer' but does not specify their version numbers. |
| Experiment Setup | Yes | For the regression task, all models share a similar architecture: two hidden layers with 50 units each, and Re LU activations... For BN-based models, the hyperparameter grid consisted of a weight decay factor ranging from 0.1 to 1 15 by a log 10 scale, and a batch size range from 32 to 1024 by a log 2 scale. For DO-based models, the hyperparameter grid consisted of the same weight decay range, and dropout probabilities in {0.2, 0.1, 0.05, 0.01, 0.005, 0.001}. DO-based models used a batch size of 32 in all evaluations. ... Estimates for the predictive distribution were obtained by taking T = 500 stochastic forward passes through the network. ... We trained a Res Net32 architecture with a batch size of 32, learning rate of 0.1, weight decay of 0.0002, leaky Re LU slope of 0.1, and 5 residual units. SGD with momentum was used as the optimizer. |