Fast Black-box Variational Inference through Stochastic Trust-Region Optimization

Authors: Jeffrey Regier, Michael I. Jordan, Jon McAuliffe

NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We implemented Trust VI in the Stan framework and compared it to two alternatives: Automatic Differentiation Variational Inference (ADVI) and Hessianfree Stochastic Gradient Variational Inference (HFSGVI). ... Trust VI typically converged at least one order of magnitude faster than ADVI, demonstrating the value of stochastic second-order information. Trust VI often found substantially better variational distributions than HFSGVI, demonstrating that our convergence theory can matter in practice.
Researcher Affiliation Academia Jeffrey Regier jregier@cs.berkeley.edu Michael I. Jordan jordan@cs.berkeley.edu Jon Mc Auliffe jon@stat.berkeley.edu
Pseudocode Yes Algorithm 1 Trust VI
Open Source Code No The paper states 'We use the authors Stan [21] implementation of ADVI, and implement the other two algorithms in Stan as well.' and cites a GitHub repository for Stan example models [22]. However, it does not explicitly provide a link to the source code for Trust VI developed by the authors.
Open Datasets Yes Our study set comprises 183 statistical models and datasets from [22], an online repository of open-source Stan models and datasets. [22] Stan developers. https://github.com/stan-dev/example-models, 2017. [Online; accessed Jan 3, 2017; commit 6fbbf36f9d14ed69c7e6da2691a3dbe1e3d55dea].
Dataset Splits No The paper does not provide specific training/validation/test dataset splits. It mentions using '183 statistical models and datasets' and that 'For our trials, the variational distribution is always mean-field multivariate Gaussian', but no explicit split percentages or counts.
Hardware Specification No The paper mentions 'SIMD parallelism on modern CPUs and GPUs' but does not provide specific hardware details such as exact GPU/CPU models or processor types.
Software Dependencies No The paper states 'We use the authors Stan [21] implementation of ADVI, and implement the other two algorithms in Stan as well.' While 'Stan' is named, no specific version number for Stan or any other software library is provided.
Experiment Setup Yes Each stochastic gradient is based on a minibatch of 256 samples of the variational distribution. The number of variational samples for stochastic Hessian-vector products and for estimates of change (85 and 128, respectively) are selected to match the degree of parallelism for stochastic gradient computations.