reproducibilityindex.ai

Implicit Variational Inference for High-Dimensional Posteriors

Authors: Anshuk Uppal, Kristoffer Stensbo-Smidt, Wouter Boomsma, Jes Frellsen

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our empirical analysis indicates our method is capable of recovering correlations across layers in large Bayesian neural networks, a property that is crucial for a network s performance but notoriously challenging to achieve. Through experiments on downstream tasks, we demonstrate that our expressive posteriors outperform state-of-the-art uncertainty quantification methods, validating the effectiveness of our training algorithm and the quality of the learned implicit distribution.
Researcher Affiliation	Academia	Anshuk Uppal Technical University of Denmark ansup@dtu.dk Kristoffer Stensbo-Smidt Technical University of Denmark krss@dtu.dk Wouter Boomsma University of Copenhagen wb@di.ku.dk Jes Frellsen Technical University of Denmark jefr@dtu.dk
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	Yes	Supporting code is available here https://github.com/Uppal Anshuk/LIVI_neurips23
Open Datasets	Yes	For regression tasks, we use a Gaussian likelihood and learn the homoscedastic variance, η2, using type-II maximum likelihood, p(y \| fθ(x)) = N(y \| µ = fθ(x), η2). For classification tasks, we use a categorical likelihood function for the labels with logits produced by fθ. We use Bayesian Neural Networks (BNNs) for our analysis due to the immense number of global latent variables (i.e., the weights and biases) that are required by modern BNN architectures to perform well with larger datasets, validating our method on previously unreachable and untested scales for implicit VI.
Dataset Splits	No	The paper mentions 'test set' and 'training data' but does not explicitly provide percentages, sample counts, or citations to predefined splits for all train/validation/test sets.
Hardware Specification	Yes	To train LIVI with the aforementioned generator architecture it takes 12.3 GPU minutes on a Titax X (Pacal 12gb vram) GPU and consumes 980 megabytes of memory when training with a batch size of 256. Keeping everything else the same, each network of the ensemble takes 34 minutes to train on CIFAR10 using an A5000(24 gb vram) for a total of 2.83 GPU hours whilst our method/architecture requires 2.30 GPU hours to train on the same dataset using the same GPU.
Software Dependencies	No	The paper mentions libraries like 'Pyro' and 'bayesian-torch' and 'Stoch Man library' but does not specify their version numbers.
Experiment Setup	Yes	We use a 2D toy sinusoidal dataset for this example with 70 training data points. The BNN architecture is shared across all the methods and is a two-hidden-layered MLP with 7 and 10 units and ELU activations trained with losses unique to each of the methods. For all our MNIST experiments we use an MMNN with 65 65 input noise, one hidden layer of size 250 250 and produce an output matrix of size 350 127, the elements of which are used as weights and biases of the Le Net BNN. For all experiments, we trained without dataset augmentation and with a maximum of 1 to 2 samples (from qγ(θ)) per minibatch.