Yes, but Did It Work?: Evaluating Variational Inference
Authors: Yuling Yao, Aki Vehtari, Daniel Simpson, Andrew Gelman
ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We propose two diagnostic algorithms to alleviate this problem. The Paretosmoothed importance sampling (PSIS) diagnostic gives a goodness of fit measurement for joint distributions, while simultaneously improving the error in the estimate. The variational simulationbased calibration (VSBC) assesses the average performance of point estimates. and 4. Applications Both PSIS and VSBC diagnostics are applicable to any variational inference algorithm. Without loss of generality, we implement mean-field Gaussian automatic differentiation variational inference (ADVI) in this section. |
| Researcher Affiliation | Academia | 1Department of Statistics, Columbia University, NY, USA 2Helsinki Institute for Information Technology, Department of Computer Science, Aalto University, Finland 3Department of Statistical Sciences, University of Toronto, Canada. |
| Pseudocode | Yes | Algorithm 1 PSIS diagnostic and Algorithm 2 VSBC marginal diagnostics |
| Open Source Code | No | The paper does not provide an explicit statement or link regarding the release of open-source code for the described methodology. |
| Open Datasets | Yes | The Eight-School Model (Gelman et al., 2013, Section 5.5) is the simplest Bayesian hierarchical normal model. |
| Dataset Splits | No | The paper mentions 'holding out an independent test dataset' in a general discussion but does not provide specific details on dataset splits (e.g., percentages or counts for training, validation, and test sets). |
| Hardware Specification | No | The paper does not explicitly describe the specific hardware used (e.g., GPU/CPU models, memory specifications) for running its experiments. |
| Software Dependencies | No | The paper mentions implementing 'mean-field Gaussian automatic differentiation variational inference (ADVI)' but does not specify version numbers for any software, libraries, or solvers used. |
| Experiment Setup | Yes | As displayed in the left panel of Figure 2, changing the threshold of relative ELBO change from a conservative 10 5 to the default recommendation 10 2 increases ˆk to 4.4, even though 10 2 works fine for many other simpler problems. |