reproducibilityindex.ai

Walsh-Hadamard Variational Inference for Bayesian Deep Learning

Authors: Simone Rossi, Sebastien Marmin, Maurizio Filippone

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive theoretical and empirical analyses demonstrate that WHVI yields considerable speedups and model reductions compared to other techniques to carry out approximate inference for over-parameterized models, and ultimately show how advances in kernel methods can be translated into advances in approximate Bayesian inference for Deep Learning.
Researcher Affiliation	Academia	Simone Rossi Data Science Department EURECOM (FR) simone.rossi@eurecom.fr; Sébastien Marmin* Data Science Department EURECOM (FR) sebastien.marmin@eurecom.fr; Maurizio Filippone Data Science Department EURECOM (FR) maurizio.filippone@eurecom.fr
Pseudocode	Yes	Algorithm 1: Setup dimensions for non-squared matrix
Open Source Code	No	The paper provides GitHub links in footnotes for other methods (e.g., Noisy Natural Gradient) but does not provide a link or explicit statement about the open-source availability of their own WHVI methodology's code.
Open Datasets	Yes	We conduct a series of comparisons with state-of-the-art VI schemes for Bayesian DNNs; see the Supplement for the list of data sets used in the experiments. ... For this experiment, we replace all fully-connected layers in the CNN with the WHVI parameterization, while the convolutional ﬁlters are treated variationally using MCD. In this setup, we ﬁt VGG16 [49], ALEXNET [29] and RESNET-18 [21] on CIFAR10. ... The dataset used is YACHT.
Dataset Splits	Yes	Data is randomly divided into 90%/10% splits for training and testing eight times.
Hardware Specification	Yes	The workstation used is equipped with two Intel Xeon CPUs, four NVIDIA Tesla P100 and 512 GB of RAM.
Software Dependencies	No	The paper mentions 'PYTORCH [44]' and 'TENSORFLOW2' and 'nvidia-smi tool' but does not specify version numbers for these software components or any other libraries.
Experiment Setup	Yes	We set the network to have two hidden layers and 128 features with Re LU activations (as a reference, these models are 20 times bigger than the usual setup, which uses a single hidden layer with 50/100 units). ... We test four models: mean-ﬁeld Gaussian VI (MFG), Monte Carlo dropout [MCD; 14] with dropout rate 0.4 and two variants of WHVI G-WHVI with Gaussian posterior and NF-WHVI with planar ﬂows (10 planar ﬂows).