Bayesian Uncertainty Estimation for Batch Normalized Deep Networks

Authors: Mattias Teye, Hossein Azizpour, Kevin Smith

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our approach is thoroughly validated by measuring the quality of uncertainty in a series of empirical experiments on different tasks. It outperforms baselines with strong statistical significance, and displays competitive performance with recent Bayesian approaches.
Researcher Affiliation Collaboration 1School of Electrical Engineering and Computer Science, KTH Royal Institute of Technology, Stockholm, Sweden 2Current address: Electronic Arts, SEED, Stockholm, Sweden. This work was carried out at Budbee AB. 3Science for Life Laboratory.
Pseudocode Yes Algorithm 1 MCBN Algorithm
Open Source Code Yes Code for reproducing our experiments is available at https://github.com/icml-mcbn/mcbn.
Open Datasets Yes Our quantitative analysis relies on CIFAR10 for image classification and eight standard regression datasets, listed in Appendix Table 1. Publicly available from the UCI Machine Learning Repository (University of California, 2017) and Delve (Ghahramani, 1996)
Dataset Splits Yes Results were averaged over five random splits of 20% test and 80% training and cross-validation (CV) data. For each split, 5-fold CV by grid search with a RMSE minimization objective was used to find training hyperparameters and optimal n.o. epochs, out of a maximum of 2000.
Hardware Specification No The paper does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for experiments. It mentions 'Implementation was done in Tensor Flow', implying computation, but lacks hardware specifics.
Software Dependencies No The paper mentions 'Tensor Flow' and 'Adam optimizer' but does not specify their version numbers.
Experiment Setup Yes For the regression task, all models share a similar architecture: two hidden layers with 50 units each, and Re LU activations... For BN-based models, the hyperparameter grid consisted of a weight decay factor ranging from 0.1 to 1 15 by a log 10 scale, and a batch size range from 32 to 1024 by a log 2 scale. For DO-based models, the hyperparameter grid consisted of the same weight decay range, and dropout probabilities in {0.2, 0.1, 0.05, 0.01, 0.005, 0.001}. DO-based models used a batch size of 32 in all evaluations. ... Estimates for the predictive distribution were obtained by taking T = 500 stochastic forward passes through the network. ... We trained a Res Net32 architecture with a batch size of 32, learning rate of 0.1, weight decay of 0.0002, leaky Re LU slope of 0.1, and 5 residual units. SGD with momentum was used as the optimizer.