A Rigorous Link between Deep Ensembles and (Variational) Bayesian Methods

Authors: Veit David Wild, Sahra Ghalebikesabi, Dino Sejdinovic, Jeremias Knoblauch

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 5 Experiments Since the paper s primary focus is on theory, we use two experiments to reinforce some of the predictions it makes in previous sections, and a third experiment that shows why in direct contradiction to a naive interpretation of the presented theory it is typically difficult to beat simple DEs. More details about the conducted experiments can be found in Appendix G. The code is available on https://github.com/sghalebikesabi/GVI-WGF.
Researcher Affiliation Academia Veit D. Wild University of Oxford Sahra Ghalebikesabi University of Oxford Dino Sejdinovic University of Adelaide Jeremias Knoblauch University College London
Pseudocode No The paper describes algorithmic steps in narrative text (e.g., 'Step 1: Sample NE N particles...', 'Step 2: Evolve the particle...') rather than in a formally structured pseudocode or algorithm block.
Open Source Code Yes The code is available on https://github.com/sghalebikesabi/GVI-WGF.
Open Datasets Yes Table 1 compares the average (Gaussian) negative log likelihood in the test set for the three ID-GVI methods on some UCI-regression data sets (Lichman, 2013).
Dataset Splits Yes We split each data set into train (81% of samples), validation (9% of samples), and test set (10% of samples).
Hardware Specification Yes While the final experimental results can be run within approximately an hour on a single Ge Force RTX 3090 GPU, the complete compute needed for the final results, debugging runs, and sweeps amounts to around 9 days.
Software Dependencies No The paper mentions common machine learning practices (e.g., neural networks), implying the use of associated software frameworks, but does not specify any software dependencies with version numbers (e.g., 'Python 3.x', 'PyTorch 1.x', 'CUDA x.x').
Experiment Setup Yes We train 5 one-hidden-layer neural networks fθ with 50 hidden nodes for 40 epochs. ... Initialisation: Kaiming intilisation, i.e. for each layer l {1, ...L} that maps features with dimensionality nl 1 into dimensionality nl, we sample Ql,0 N(0, 2/nl) Reg. parameter: λDLE = 10 4, λDRLE = 10 4, λ DRLE = 10 2 Step size: η = 0.1, Iterations: K = 10,000