Fast Black-box Variational Inference through Stochastic Trust-Region Optimization
Authors: Jeffrey Regier, Michael I. Jordan, Jon McAuliffe
NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We implemented Trust VI in the Stan framework and compared it to two alternatives: Automatic Differentiation Variational Inference (ADVI) and Hessianfree Stochastic Gradient Variational Inference (HFSGVI). ... Trust VI typically converged at least one order of magnitude faster than ADVI, demonstrating the value of stochastic second-order information. Trust VI often found substantially better variational distributions than HFSGVI, demonstrating that our convergence theory can matter in practice. |
| Researcher Affiliation | Academia | Jeffrey Regier jregier@cs.berkeley.edu Michael I. Jordan jordan@cs.berkeley.edu Jon Mc Auliffe jon@stat.berkeley.edu |
| Pseudocode | Yes | Algorithm 1 Trust VI |
| Open Source Code | No | The paper states 'We use the authors Stan [21] implementation of ADVI, and implement the other two algorithms in Stan as well.' and cites a GitHub repository for Stan example models [22]. However, it does not explicitly provide a link to the source code for Trust VI developed by the authors. |
| Open Datasets | Yes | Our study set comprises 183 statistical models and datasets from [22], an online repository of open-source Stan models and datasets. [22] Stan developers. https://github.com/stan-dev/example-models, 2017. [Online; accessed Jan 3, 2017; commit 6fbbf36f9d14ed69c7e6da2691a3dbe1e3d55dea]. |
| Dataset Splits | No | The paper does not provide specific training/validation/test dataset splits. It mentions using '183 statistical models and datasets' and that 'For our trials, the variational distribution is always mean-field multivariate Gaussian', but no explicit split percentages or counts. |
| Hardware Specification | No | The paper mentions 'SIMD parallelism on modern CPUs and GPUs' but does not provide specific hardware details such as exact GPU/CPU models or processor types. |
| Software Dependencies | No | The paper states 'We use the authors Stan [21] implementation of ADVI, and implement the other two algorithms in Stan as well.' While 'Stan' is named, no specific version number for Stan or any other software library is provided. |
| Experiment Setup | Yes | Each stochastic gradient is based on a minibatch of 256 samples of the variational distribution. The number of variational samples for stochastic Hessian-vector products and for estimates of change (85 and 128, respectively) are selected to match the degree of parallelism for stochastic gradient computations. |