reproducibilityindex.ai

Bayesian Deep Ensembles via the Neural Tangent Kernel

Authors: Bobby He, Balaji Lakshminarayanan, Yee Whye Teh

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, using finite width NNs we demonstrate that our Bayesian deep ensembles faithfully emulate the analytic posterior predictive when available, and can outperform standard deep ensembles in various out-of-distribution settings, for both regression and classification tasks. and 4 Experiments
Researcher Affiliation	Collaboration	Bobby He Department of Statistics University of Oxford bobby.he@stats.ox.ac.uk Balaji Lakshminarayanan Google Research Brain team balajiln@google.com Yee Whye Teh Department of Statistics University of Oxford y.w.teh@stats.ox.ac.uk
Pseudocode	Yes	Algorithm 1 NTKGP-param ensemble
Open Source Code	Yes	Code for this experiment is available at: https://github.com/bobby-he/bayesian-ntk.
Open Datasets	Yes	Flight Delays dataset [43], MNIST vs Not MNIST, CIFAR-10 vs SVHN
Dataset Splits	No	In order to obtain probabilistic predictions, we temperature scale our trained ensemble predictions with cross-entropy loss on a held-out validation set and tuned using the validation accuracy on a small set of values around the He initialisation. No specific split percentages or counts are provided for the validation set.
Hardware Specification	No	No specific hardware details (like GPU/CPU models, memory, or cloud instances) are mentioned for running experiments.
Software Dependencies	No	init( ) will be standard parameterisation initialisation in the JAX library Neural Tangents [38] unless stated otherwise. No specific version numbers for JAX or Neural Tangents are provided.
Experiment Setup	Yes	For each ensemble method, we use MLP baselearners with two hidden layers of width 512, and erf activation. and The weight parameter initialisation variance σ2 W is tuned using the validation accuracy on a small set of values around the He initialisation, σ2 W =2, [44] for all classiﬁcation experiments. and baselearners taking the Myrtle-10 CNN architecture [40] of channel-width 100.