Walsh-Hadamard Variational Inference for Bayesian Deep Learning
Authors: Simone Rossi, Sebastien Marmin, Maurizio Filippone
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive theoretical and empirical analyses demonstrate that WHVI yields considerable speedups and model reductions compared to other techniques to carry out approximate inference for over-parameterized models, and ultimately show how advances in kernel methods can be translated into advances in approximate Bayesian inference for Deep Learning. |
| Researcher Affiliation | Academia | Simone Rossi Data Science Department EURECOM (FR) simone.rossi@eurecom.fr; Sébastien Marmin* Data Science Department EURECOM (FR) sebastien.marmin@eurecom.fr; Maurizio Filippone Data Science Department EURECOM (FR) maurizio.filippone@eurecom.fr |
| Pseudocode | Yes | Algorithm 1: Setup dimensions for non-squared matrix |
| Open Source Code | No | The paper provides GitHub links in footnotes for other methods (e.g., Noisy Natural Gradient) but does not provide a link or explicit statement about the open-source availability of their own WHVI methodology's code. |
| Open Datasets | Yes | We conduct a series of comparisons with state-of-the-art VI schemes for Bayesian DNNs; see the Supplement for the list of data sets used in the experiments. ... For this experiment, we replace all fully-connected layers in the CNN with the WHVI parameterization, while the convolutional filters are treated variationally using MCD. In this setup, we fit VGG16 [49], ALEXNET [29] and RESNET-18 [21] on CIFAR10. ... The dataset used is YACHT. |
| Dataset Splits | Yes | Data is randomly divided into 90%/10% splits for training and testing eight times. |
| Hardware Specification | Yes | The workstation used is equipped with two Intel Xeon CPUs, four NVIDIA Tesla P100 and 512 GB of RAM. |
| Software Dependencies | No | The paper mentions 'PYTORCH [44]' and 'TENSORFLOW2' and 'nvidia-smi tool' but does not specify version numbers for these software components or any other libraries. |
| Experiment Setup | Yes | We set the network to have two hidden layers and 128 features with Re LU activations (as a reference, these models are 20 times bigger than the usual setup, which uses a single hidden layer with 50/100 units). ... We test four models: mean-field Gaussian VI (MFG), Monte Carlo dropout [MCD; 14] with dropout rate 0.4 and two variants of WHVI G-WHVI with Gaussian posterior and NF-WHVI with planar flows (10 planar flows). |