Automatic Variational Inference in Stan
Authors: Alp Kucukelbir, Rajesh Ranganath, Andrew Gelman, David Blei
NeurIPS 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We compare advi to mcmc sampling across hierarchical generalized linear models, nonconjugate matrix factorization, and a mixture model. We train the mixture model on a quarter million images. |
| Researcher Affiliation | Academia | Alp Kucukelbir Columbia University alp@cs.columbia.edu Rajesh Ranganath Princeton University rajeshr@cs.princeton.edu Andrew Gelman Columbia University gelman@stat.columbia.edu David M. Blei Columbia University david.blei@columbia.edu |
| Pseudocode | Yes | Algorithm 1: Automatic differentiation variational inference (advi) |
| Open Source Code | Yes | We propose an automatic variational inference algorithm, automatic differentiation variational inference (advi); we implement it in Stan (code available), a probabilistic programming system. |
| Open Datasets | Yes | Here, we show how easy it is to explore new models using advi. In both models, we use the Frey Face dataset, which contains 1956 frames (28 20 pixels) of facial expressions extracted from a video sequence. We explore the imageclef dataset, which has 250 000 images [25]. |
| Dataset Splits | No | The paper mentions training sets and held-out/evaluation sets (e.g., 'We use 10 000 training samples and hold out 1000 for testing'), but it does not explicitly define or use a separate 'validation' split. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., CPU, GPU models, memory) used for running the experiments. |
| Software Dependencies | Yes | advi is available in Stan 2.8. See Appendix C. |
| Experiment Setup | Yes | We approximate the posterior predictive likelihood using a mc estimate. For mcmc, we plug in posterior samples. For advi, we draw samples from the posterior approximation during the optimization. We initialize advi with a draw from a standard Gaussian. We study advi with two settings of M, the number of mc samples used to estimate gradients. A single sample per iteration is sufficient; it is also the fastest. (We set M D 1 from here on.) |