VarGrad: A Low-Variance Gradient Estimator for Variational Inference

Authors: Lorenz Richter, Ayman Boustati, Nikolas Nüsken, Francisco Ruiz, Omer Deniz Akyildiz

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically demonstrate that Var Grad offers a favourable variance versus computation trade-off compared to other state-of-the-art estimators on a discrete variational autoencoder (VAE). In order to verify the properties of Var Grad empirically, we test it on two popular models: a Bayesian logistic regression model on a synthetic dataset and a discrete variational autoencoder (DVAE).
Researcher Affiliation Collaboration Lorenz Richter Freie Universität Berlin BTU Cottbus-Senftenberg dida Datenschmiede Gmb H lorenz.richter@fu-berlin.de Ayman Boustati University of Warwick a.boustati@warwick.ac.uk Nikolas Nüsken Universität Potsdam nuesken@uni-potsdam.de Francisco J. R. Ruiz Deep Mind franrruiz@google.com Ömer Deniz Akyildiz University of Warwick The Alan Turing Institute omer.akyildiz@warwick.ac.uk
Pseudocode Yes Algorithm 1 Pseudocode for Var Grad
Open Source Code Yes Code in JAX [Bradbury et al., 2018, Hennigan et al., 2020] is available at https://github.com/aboustati/vargrad.
Open Datasets Yes In order to verify the properties of Var Grad empirically, we test it on two popular models: a Bayesian logistic regression model on a synthetic dataset and a discrete variational autoencoder (DVAE) on a fixed binarisation of Omniglot [Lake et al., 2015].
Dataset Splits Yes For logistic regression, we used a synthetic dataset of 1,000 observations and 200 dimensions. We split the data into 80% training and 20% test samples.
Hardware Specification No The paper does not provide specific hardware details (e.g., CPU/GPU models, memory) used for running its experiments.
Software Dependencies No The paper mentions 'Code in JAX' and refers to JAX and Haiku in the references, but does not specify version numbers for any software dependencies.
Experiment Setup Yes The negative ELBO is computed on the standard test split and the optimisation uses Adam [Kingma and Ba, 2015] with learning rate of 0.001. For logistic regression, we used a batch size of 256. For training, we use the Adam optimizer [Kingma and Ba, 2015] with a learning rate of 1e-3, and run for 1,000 epochs. For all experiments, we used 4 Monte Carlo samples for gradient estimation. For DVAE, We used a batch size of 200. For training, we used the Adam optimizer with a learning rate of 1e-3. We ran the experiments for 100 epochs for the linear DVAE and 500 epochs for the non-linear DVAE.