Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Bayesian Deep Ensembles via the Neural Tangent Kernel
Authors: Bobby He, Balaji Lakshminarayanan, Yee Whye Teh
NeurIPS 2020 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, using finite width NNs we demonstrate that our Bayesian deep ensembles faithfully emulate the analytic posterior predictive when available, and can outperform standard deep ensembles in various out-of-distribution settings, for both regression and classification tasks. and 4 Experiments |
| Researcher Affiliation | Collaboration | Bobby He Department of Statistics University of Oxford EMAIL Balaji Lakshminarayanan Google Research Brain team EMAIL Yee Whye Teh Department of Statistics University of Oxford EMAIL |
| Pseudocode | Yes | Algorithm 1 NTKGP-param ensemble |
| Open Source Code | Yes | Code for this experiment is available at: https://github.com/bobby-he/bayesian-ntk. |
| Open Datasets | Yes | Flight Delays dataset [43], MNIST vs Not MNIST, CIFAR-10 vs SVHN |
| Dataset Splits | No | In order to obtain probabilistic predictions, we temperature scale our trained ensemble predictions with cross-entropy loss on a held-out validation set and tuned using the validation accuracy on a small set of values around the He initialisation. No specific split percentages or counts are provided for the validation set. |
| Hardware Specification | No | No specific hardware details (like GPU/CPU models, memory, or cloud instances) are mentioned for running experiments. |
| Software Dependencies | No | init( ) will be standard parameterisation initialisation in the JAX library Neural Tangents [38] unless stated otherwise. No specific version numbers for JAX or Neural Tangents are provided. |
| Experiment Setup | Yes | For each ensemble method, we use MLP baselearners with two hidden layers of width 512, and erf activation. and The weight parameter initialisation variance σ2 W is tuned using the validation accuracy on a small set of values around the He initialisation, σ2 W =2, [44] for all classification experiments. and baselearners taking the Myrtle-10 CNN architecture [40] of channel-width 100. |