Copula-like Variational Inference

Authors: Marcel Hirt, Petros Dellaportas, Alain Durmus

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We provide some empirical evidence that such a variational family can also approximate non-Gaussian posteriors and can be beneficial compared to Gaussian approximations. Our method performs largely comparably to state-of-the-art variational approximations on standard regression and classification benchmarks for Bayesian Neural Networks.
Researcher Affiliation Academia Marcel Hirt Department of Statistical Science University College of London, UK marcel.hirt.16@ucl.ac.uk Petros Dellaportas Department of Statistical Science University College of London, UK Department of Statistics Athens University of Economics and Business, Greece and The Alan Turing Institute, UK Alain Durmus CMLA Ecole normale sup erieure Paris-Saclay, CNRS, Universit e Paris-Saclay, 94235 Cachan, France. alain.durmus@cmla.ens-cachan.fr
Pseudocode Yes Algorithm 1 Sampling from the rotated copula-like density. 1: Sample (V1, . . . , Vd) cθ using Proposition 1. 2: Set U = H (V ) where H is defined in (7). 3: Set X = G (U), where G is defined in (5). 4: Set X = T3, where T3 is defined in (9).
Open Source Code No The paper does not provide concrete access to source code for the methodology described.
Open Datasets Yes Inference with the proposed variational family is applied on commonly considered UCI regression datasets, repeating the experimental set-up used in [15]. ... We performed classification on MNIST using a 2-hidden layer fully-connected network where the hidden layers are of size 200 each.
Dataset Splits Yes We choose the hyper-parameter σ2 0 {0.01, 0.1, 1., 10., 100.} that performed best on a validation dataset in terms of its predictive log-likelihood. ... Further details about the algorithmic details are given in Appendix D. ... For MNIST, we use 50000 training, 10000 test and 60000 training-validation examples and we run 100 epochs.
Hardware Specification No The paper mentions "the UCL Myriad High Throughput Computing Facility (Myriad@UCL)" in the Acknowledgements, but it does not provide specific hardware details such as exact GPU/CPU models, processor types, or memory amounts used for running experiments.
Software Dependencies No The paper mentions using "Adam [33]" for optimization and "tensorflow probability [9]" for reparametrized gradients, but it does not provide specific version numbers for these software dependencies.
Experiment Setup Yes We choose the hyper-parameter σ2 0 {0.01, 0.1, 1., 10., 100.} that performed best on a validation dataset in terms of its predictive log-likelihood. Optimization was performed using Adam [33] with a learning rate of 0.002. ... we use neural networks with Re LU activation functions and one hidden layer of size 50 for all datasets with the exception of the protein dataset that utilizes a hidden layer of size 100. ... We performed classification on MNIST using a 2-hidden layer fully-connected network where the hidden layers are of size 200 each. ... Learning rate 0.002, mini-batch size of 128 ... we run 100 epochs.