VarGrad: A Low-Variance Gradient Estimator for Variational Inference
Authors: Lorenz Richter, Ayman Boustati, Nikolas Nüsken, Francisco Ruiz, Omer Deniz Akyildiz
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically demonstrate that Var Grad offers a favourable variance versus computation trade-off compared to other state-of-the-art estimators on a discrete variational autoencoder (VAE). In order to verify the properties of Var Grad empirically, we test it on two popular models: a Bayesian logistic regression model on a synthetic dataset and a discrete variational autoencoder (DVAE). |
| Researcher Affiliation | Collaboration | Lorenz Richter Freie Universität Berlin BTU Cottbus-Senftenberg dida Datenschmiede Gmb H lorenz.richter@fu-berlin.de Ayman Boustati University of Warwick a.boustati@warwick.ac.uk Nikolas Nüsken Universität Potsdam nuesken@uni-potsdam.de Francisco J. R. Ruiz Deep Mind franrruiz@google.com Ömer Deniz Akyildiz University of Warwick The Alan Turing Institute omer.akyildiz@warwick.ac.uk |
| Pseudocode | Yes | Algorithm 1 Pseudocode for Var Grad |
| Open Source Code | Yes | Code in JAX [Bradbury et al., 2018, Hennigan et al., 2020] is available at https://github.com/aboustati/vargrad. |
| Open Datasets | Yes | In order to verify the properties of Var Grad empirically, we test it on two popular models: a Bayesian logistic regression model on a synthetic dataset and a discrete variational autoencoder (DVAE) on a fixed binarisation of Omniglot [Lake et al., 2015]. |
| Dataset Splits | Yes | For logistic regression, we used a synthetic dataset of 1,000 observations and 200 dimensions. We split the data into 80% training and 20% test samples. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., CPU/GPU models, memory) used for running its experiments. |
| Software Dependencies | No | The paper mentions 'Code in JAX' and refers to JAX and Haiku in the references, but does not specify version numbers for any software dependencies. |
| Experiment Setup | Yes | The negative ELBO is computed on the standard test split and the optimisation uses Adam [Kingma and Ba, 2015] with learning rate of 0.001. For logistic regression, we used a batch size of 256. For training, we use the Adam optimizer [Kingma and Ba, 2015] with a learning rate of 1e-3, and run for 1,000 epochs. For all experiments, we used 4 Monte Carlo samples for gradient estimation. For DVAE, We used a batch size of 200. For training, we used the Adam optimizer with a learning rate of 1e-3. We ran the experiments for 100 epochs for the linear DVAE and 500 epochs for the non-linear DVAE. |