Provable Gradient Variance Guarantees for Black-Box Variational Inference

Authors: Justin Domke

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 4 Empirical Evaluation: Fig. 1 shows the evolution of the ELBO along with the variance of gradient estimation either in batch (using the full dataset in each evaluation) and uniform (stochastically with (n) = 1/N).
Researcher Affiliation Academia Justin Domke College of Information and Computer Sciences University of Massachusetts Amherst domke@cs.umass.edu
Pseudocode No The paper does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any explicit statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets Yes Table 1: Regression (r) and classification (c) datasets: Dataset Type # data # dims boston r 506 13, fires r 517 12, cpusmall r 8192 13, a1a c 1695 124, ionosphere c 351 35, australian c 690 15, sonar c 208 61, mushrooms c 8124 113
Dataset Splits No The paper mentions training and testing contexts (e.g., 'To enable a clear comparison of of different estimators and bounds, we generate a single optimization trace of parameter vectors w for each dataset.'), but it does not specify explicit dataset splits like percentages or sample counts for training, validation, and testing.
Hardware Specification No The paper does not specify any hardware details (e.g., CPU, GPU models, memory) used for conducting the experiments.
Software Dependencies No The paper does not provide specific software dependencies or version numbers for libraries, frameworks, or programming languages used in the experiments.
Experiment Setup Yes To enable a clear comparison of of different estimators and bounds, we generate a single optimization trace of parameter vectors w for each dataset. All comparisons use this same trace. These use a conservative optimization method: Find a maximum z and then initialize to w = ( z, 0). Then, optimization uses proximal stochastic gradient descent (with the proximal operator reflecting h) with a step size of 1/M (the scalar smoothness constant) and 1000 evaluations for each gradient estimate.