Implicit Variational Inference for High-Dimensional Posteriors

Authors: Anshuk Uppal, Kristoffer Stensbo-Smidt, Wouter Boomsma, Jes Frellsen

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our empirical analysis indicates our method is capable of recovering correlations across layers in large Bayesian neural networks, a property that is crucial for a network s performance but notoriously challenging to achieve. Through experiments on downstream tasks, we demonstrate that our expressive posteriors outperform state-of-the-art uncertainty quantification methods, validating the effectiveness of our training algorithm and the quality of the learned implicit distribution.
Researcher Affiliation Academia Anshuk Uppal Technical University of Denmark ansup@dtu.dk Kristoffer Stensbo-Smidt Technical University of Denmark krss@dtu.dk Wouter Boomsma University of Copenhagen wb@di.ku.dk Jes Frellsen Technical University of Denmark jefr@dtu.dk
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code Yes Supporting code is available here https://github.com/Uppal Anshuk/LIVI_neurips23
Open Datasets Yes For regression tasks, we use a Gaussian likelihood and learn the homoscedastic variance, η2, using type-II maximum likelihood, p(y | fθ(x)) = N(y | µ = fθ(x), η2). For classification tasks, we use a categorical likelihood function for the labels with logits produced by fθ. We use Bayesian Neural Networks (BNNs) for our analysis due to the immense number of global latent variables (i.e., the weights and biases) that are required by modern BNN architectures to perform well with larger datasets, validating our method on previously unreachable and untested scales for implicit VI.
Dataset Splits No The paper mentions 'test set' and 'training data' but does not explicitly provide percentages, sample counts, or citations to predefined splits for all train/validation/test sets.
Hardware Specification Yes To train LIVI with the aforementioned generator architecture it takes 12.3 GPU minutes on a Titax X (Pacal 12gb vram) GPU and consumes 980 megabytes of memory when training with a batch size of 256. Keeping everything else the same, each network of the ensemble takes 34 minutes to train on CIFAR10 using an A5000(24 gb vram) for a total of 2.83 GPU hours whilst our method/architecture requires 2.30 GPU hours to train on the same dataset using the same GPU.
Software Dependencies No The paper mentions libraries like 'Pyro' and 'bayesian-torch' and 'Stoch Man library' but does not specify their version numbers.
Experiment Setup Yes We use a 2D toy sinusoidal dataset for this example with 70 training data points. The BNN architecture is shared across all the methods and is a two-hidden-layered MLP with 7 and 10 units and ELU activations trained with losses unique to each of the methods. For all our MNIST experiments we use an MMNN with 65 65 input noise, one hidden layer of size 250 250 and produce an output matrix of size 350 127, the elements of which are used as weights and biases of the Le Net BNN. For all experiments, we trained without dataset augmentation and with a maximum of 1 to 2 samples (from qγ(θ)) per minibatch.