Approximation Based Variance Reduction for Reparameterization Gradients
Authors: Tomas Geffner, Justin Domke
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically show that this control variate leads to large improvements in gradient variance and optimization convergence for inference with non-factorized variational distributions. |
| Researcher Affiliation | Academia | Tomas Geffner College of Information and Computer Science University of Massachusetts, Amherst tgeffner@cs.umass.edu Justin Domke College of Information and Computer Science University of Massachusetts, Amherst domke@cs.umass.edu |
| Pseudocode | Yes | Algorithm 1 SGVI with the proposed control variate. |
| Open Source Code | No | The paper does not provide an explicit statement or link for open-source code for the methodology. |
| Open Datasets | Yes | We use three different models: Logistic regression with the a1a dataset, hierarchical regression with the frisk dataset [7], and a Bayesian neural network with the red wine dataset. The latter two are the ones used by Miller et al. [17]. |
| Dataset Splits | No | The paper does not explicitly provide training/validation/test dataset splits needed to reproduce the experiment. |
| Hardware Specification | Yes | we use Py Torch 1.1.0 on an Intel i5 2.3GHz |
| Software Dependencies | Yes | we use Py Torch 1.1.0 on an Intel i5 2.3GHz |
| Experiment Setup | Yes | We use Adam [13] to optimize the parameters w of the variational distribution qw (with step sizes between 10 5 and 10 2). We use Adam with a step size of 0.01 to optimize the parameters v of the control variate, by minimizing the proxy to the variance from Eq. 12. We parameterize Bv as a diagonal plus rank-rv. We set rv = 10 when diagonal or diagonal plus low rank variational distributions are used, and rv = 20 when a full-rank variational distribution is used. We use M = 10 and M = 50 samples from qw to estimate gradients. |