Implicit Variational Inference for High-Dimensional Posteriors
Authors: Anshuk Uppal, Kristoffer Stensbo-Smidt, Wouter Boomsma, Jes Frellsen
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our empirical analysis indicates our method is capable of recovering correlations across layers in large Bayesian neural networks, a property that is crucial for a network s performance but notoriously challenging to achieve. Through experiments on downstream tasks, we demonstrate that our expressive posteriors outperform state-of-the-art uncertainty quantification methods, validating the effectiveness of our training algorithm and the quality of the learned implicit distribution. |
| Researcher Affiliation | Academia | Anshuk Uppal Technical University of Denmark ansup@dtu.dk Kristoffer Stensbo-Smidt Technical University of Denmark krss@dtu.dk Wouter Boomsma University of Copenhagen wb@di.ku.dk Jes Frellsen Technical University of Denmark jefr@dtu.dk |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Supporting code is available here https://github.com/Uppal Anshuk/LIVI_neurips23 |
| Open Datasets | Yes | For regression tasks, we use a Gaussian likelihood and learn the homoscedastic variance, η2, using type-II maximum likelihood, p(y | fθ(x)) = N(y | µ = fθ(x), η2). For classification tasks, we use a categorical likelihood function for the labels with logits produced by fθ. We use Bayesian Neural Networks (BNNs) for our analysis due to the immense number of global latent variables (i.e., the weights and biases) that are required by modern BNN architectures to perform well with larger datasets, validating our method on previously unreachable and untested scales for implicit VI. |
| Dataset Splits | No | The paper mentions 'test set' and 'training data' but does not explicitly provide percentages, sample counts, or citations to predefined splits for all train/validation/test sets. |
| Hardware Specification | Yes | To train LIVI with the aforementioned generator architecture it takes 12.3 GPU minutes on a Titax X (Pacal 12gb vram) GPU and consumes 980 megabytes of memory when training with a batch size of 256. Keeping everything else the same, each network of the ensemble takes 34 minutes to train on CIFAR10 using an A5000(24 gb vram) for a total of 2.83 GPU hours whilst our method/architecture requires 2.30 GPU hours to train on the same dataset using the same GPU. |
| Software Dependencies | No | The paper mentions libraries like 'Pyro' and 'bayesian-torch' and 'Stoch Man library' but does not specify their version numbers. |
| Experiment Setup | Yes | We use a 2D toy sinusoidal dataset for this example with 70 training data points. The BNN architecture is shared across all the methods and is a two-hidden-layered MLP with 7 and 10 units and ELU activations trained with losses unique to each of the methods. For all our MNIST experiments we use an MMNN with 65 65 input noise, one hidden layer of size 250 250 and produce an output matrix of size 350 127, the elements of which are used as weights and biases of the Le Net BNN. For all experiments, we trained without dataset augmentation and with a maximum of 1 to 2 samples (from qγ(θ)) per minibatch. |