Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles
Authors: Balaji Lakshminarayanan, Alexander Pritzel, Charles Blundell
NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We propose an alternative to Bayesian NNs that is simple to implement, readily parallelizable, requires very little hyperparameter tuning, and yields high quality predictive uncertainty estimates. Through a series of experiments on classification and regression benchmarks, we demonstrate that our method produces well-calibrated uncertainty estimates which are as good or better than approximate Bayesian NNs. To assess robustness to dataset shift, we evaluate the predictive uncertainty on test examples from known and unknown distributions, and show that our method is able to express higher uncertainty on out-of-distribution examples. We demonstrate the scalability of our method by evaluating predictive uncertainty estimates on Image Net. |
| Researcher Affiliation | Industry | Balaji Lakshminarayanan Alexander Pritzel Charles Blundell Deep Mind {balajiln,apritzel,cblundell}@google.com |
| Pseudocode | Yes | Algorithm 1 Pseudocode of the training procedure for our method |
| Open Source Code | No | The paper does not contain an explicit statement or link providing access to the source code for the described methodology. |
| Open Datasets | Yes | Datasets RMSE NLL PBP MC-dropout Deep Ensembles PBP MC-dropout Deep Ensembles Boston housing... MNIST, SVHN and Image Net... [51] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and L. Fei-Fei. Image Net Large Scale Visual Recognition Challenge. International Journal of Computer Vision (IJCV), 115(3):211 252, 2015. |
| Dataset Splits | No | The paper mentions train-test splits and standard datasets, but does not provide specific details on validation splits (percentages, counts, or explicit standard validation sets). |
| Hardware Specification | No | The paper does not specify the hardware used for running the experiments (e.g., CPU, GPU models, or cloud computing instances with detailed specifications). |
| Software Dependencies | No | The paper mentions "Torch" but does not provide specific version numbers for any software dependencies. |
| Experiment Setup | Yes | Unless otherwise specified, we used batch size of 100 and Adam optimizer with fixed learning rate of 0.1 in our experiments. ... We trained for 40 epochs; we refer to [24] for further details about the datasets and the experimental protocol. |