Walsh-Hadamard Variational Inference for Bayesian Deep Learning

Authors: Simone Rossi, Sebastien Marmin, Maurizio Filippone

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive theoretical and empirical analyses demonstrate that WHVI yields considerable speedups and model reductions compared to other techniques to carry out approximate inference for over-parameterized models, and ultimately show how advances in kernel methods can be translated into advances in approximate Bayesian inference for Deep Learning.
Researcher Affiliation Academia Simone Rossi Data Science Department EURECOM (FR) simone.rossi@eurecom.fr; Sébastien Marmin* Data Science Department EURECOM (FR) sebastien.marmin@eurecom.fr; Maurizio Filippone Data Science Department EURECOM (FR) maurizio.filippone@eurecom.fr
Pseudocode Yes Algorithm 1: Setup dimensions for non-squared matrix
Open Source Code No The paper provides GitHub links in footnotes for other methods (e.g., Noisy Natural Gradient) but does not provide a link or explicit statement about the open-source availability of their own WHVI methodology's code.
Open Datasets Yes We conduct a series of comparisons with state-of-the-art VI schemes for Bayesian DNNs; see the Supplement for the list of data sets used in the experiments. ... For this experiment, we replace all fully-connected layers in the CNN with the WHVI parameterization, while the convolutional filters are treated variationally using MCD. In this setup, we fit VGG16 [49], ALEXNET [29] and RESNET-18 [21] on CIFAR10. ... The dataset used is YACHT.
Dataset Splits Yes Data is randomly divided into 90%/10% splits for training and testing eight times.
Hardware Specification Yes The workstation used is equipped with two Intel Xeon CPUs, four NVIDIA Tesla P100 and 512 GB of RAM.
Software Dependencies No The paper mentions 'PYTORCH [44]' and 'TENSORFLOW2' and 'nvidia-smi tool' but does not specify version numbers for these software components or any other libraries.
Experiment Setup Yes We set the network to have two hidden layers and 128 features with Re LU activations (as a reference, these models are 20 times bigger than the usual setup, which uses a single hidden layer with 50/100 units). ... We test four models: mean-field Gaussian VI (MFG), Monte Carlo dropout [MCD; 14] with dropout rate 0.4 and two variants of WHVI G-WHVI with Gaussian posterior and NF-WHVI with planar flows (10 planar flows).