Provable Gradient Variance Guarantees for Black-Box Variational Inference
Authors: Justin Domke
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 4 Empirical Evaluation: Fig. 1 shows the evolution of the ELBO along with the variance of gradient estimation either in batch (using the full dataset in each evaluation) and uniform (stochastically with (n) = 1/N). |
| Researcher Affiliation | Academia | Justin Domke College of Information and Computer Sciences University of Massachusetts Amherst domke@cs.umass.edu |
| Pseudocode | No | The paper does not contain any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any explicit statement or link indicating that the source code for the described methodology is publicly available. |
| Open Datasets | Yes | Table 1: Regression (r) and classification (c) datasets: Dataset Type # data # dims boston r 506 13, fires r 517 12, cpusmall r 8192 13, a1a c 1695 124, ionosphere c 351 35, australian c 690 15, sonar c 208 61, mushrooms c 8124 113 |
| Dataset Splits | No | The paper mentions training and testing contexts (e.g., 'To enable a clear comparison of of different estimators and bounds, we generate a single optimization trace of parameter vectors w for each dataset.'), but it does not specify explicit dataset splits like percentages or sample counts for training, validation, and testing. |
| Hardware Specification | No | The paper does not specify any hardware details (e.g., CPU, GPU models, memory) used for conducting the experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies or version numbers for libraries, frameworks, or programming languages used in the experiments. |
| Experiment Setup | Yes | To enable a clear comparison of of different estimators and bounds, we generate a single optimization trace of parameter vectors w for each dataset. All comparisons use this same trace. These use a conservative optimization method: Find a maximum z and then initialize to w = ( z, 0). Then, optimization uses proximal stochastic gradient descent (with the proximal operator reflecting h) with a step size of 1/M (the scalar smoothness constant) and 1000 evaluations for each gradient estimate. |