Yes, but Did It Work?: Evaluating Variational Inference

Authors: Yuling Yao, Aki Vehtari, Daniel Simpson, Andrew Gelman

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We propose two diagnostic algorithms to alleviate this problem. The Paretosmoothed importance sampling (PSIS) diagnostic gives a goodness of fit measurement for joint distributions, while simultaneously improving the error in the estimate. The variational simulationbased calibration (VSBC) assesses the average performance of point estimates. and 4. Applications Both PSIS and VSBC diagnostics are applicable to any variational inference algorithm. Without loss of generality, we implement mean-field Gaussian automatic differentiation variational inference (ADVI) in this section.
Researcher Affiliation Academia 1Department of Statistics, Columbia University, NY, USA 2Helsinki Institute for Information Technology, Department of Computer Science, Aalto University, Finland 3Department of Statistical Sciences, University of Toronto, Canada.
Pseudocode Yes Algorithm 1 PSIS diagnostic and Algorithm 2 VSBC marginal diagnostics
Open Source Code No The paper does not provide an explicit statement or link regarding the release of open-source code for the described methodology.
Open Datasets Yes The Eight-School Model (Gelman et al., 2013, Section 5.5) is the simplest Bayesian hierarchical normal model.
Dataset Splits No The paper mentions 'holding out an independent test dataset' in a general discussion but does not provide specific details on dataset splits (e.g., percentages or counts for training, validation, and test sets).
Hardware Specification No The paper does not explicitly describe the specific hardware used (e.g., GPU/CPU models, memory specifications) for running its experiments.
Software Dependencies No The paper mentions implementing 'mean-field Gaussian automatic differentiation variational inference (ADVI)' but does not specify version numbers for any software, libraries, or solvers used.
Experiment Setup Yes As displayed in the left panel of Figure 2, changing the threshold of relative ELBO change from a conservative 10 5 to the default recommendation 10 2 increases ˆk to 4.4, even though 10 2 works fine for many other simpler problems.