Bayesian Deep Ensembles via the Neural Tangent Kernel
Authors: Bobby He, Balaji Lakshminarayanan, Yee Whye Teh
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, using finite width NNs we demonstrate that our Bayesian deep ensembles faithfully emulate the analytic posterior predictive when available, and can outperform standard deep ensembles in various out-of-distribution settings, for both regression and classification tasks. and 4 Experiments |
| Researcher Affiliation | Collaboration | Bobby He Department of Statistics University of Oxford bobby.he@stats.ox.ac.uk Balaji Lakshminarayanan Google Research Brain team balajiln@google.com Yee Whye Teh Department of Statistics University of Oxford y.w.teh@stats.ox.ac.uk |
| Pseudocode | Yes | Algorithm 1 NTKGP-param ensemble |
| Open Source Code | Yes | Code for this experiment is available at: https://github.com/bobby-he/bayesian-ntk. |
| Open Datasets | Yes | Flight Delays dataset [43], MNIST vs Not MNIST, CIFAR-10 vs SVHN |
| Dataset Splits | No | In order to obtain probabilistic predictions, we temperature scale our trained ensemble predictions with cross-entropy loss on a held-out validation set and tuned using the validation accuracy on a small set of values around the He initialisation. No specific split percentages or counts are provided for the validation set. |
| Hardware Specification | No | No specific hardware details (like GPU/CPU models, memory, or cloud instances) are mentioned for running experiments. |
| Software Dependencies | No | init( ) will be standard parameterisation initialisation in the JAX library Neural Tangents [38] unless stated otherwise. No specific version numbers for JAX or Neural Tangents are provided. |
| Experiment Setup | Yes | For each ensemble method, we use MLP baselearners with two hidden layers of width 512, and erf activation. and The weight parameter initialisation variance σ2 W is tuned using the validation accuracy on a small set of values around the He initialisation, σ2 W =2, [44] for all classification experiments. and baselearners taking the Myrtle-10 CNN architecture [40] of channel-width 100. |