Variational Inference via $\chi$ Upper Bound Minimization
Authors: Adji Bousso Dieng, Dustin Tran, Rajesh Ranganath, John Paisley, David Blei
NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 3 Empirical Study We developed CHIVI, a black box variational inference algorithm for minimizing the χ-divergence. We now study CHIVI with several models: probit regression, Gaussian process (GP) classification, and Cox processes. ... Table 1: Test error for Bayesian probit regression. ... Table 2: Test error for Gaussian process classification. ... Table 3: Average L1 error for posterior uncertainty estimates (ground truth from HMC). |
| Researcher Affiliation | Academia | Adji B. Dieng Columbia University Dustin Tran Columbia University Rajesh Ranganath Princeton University John Paisley Columbia University David M. Blei Columbia University |
| Pseudocode | Yes | Algorithm 1: χ-divergence variational inference (CHIVI) |
| Open Source Code | No | The paper does not provide any statement or link indicating that the source code for the described methodology is publicly available. |
| Open Datasets | Yes | We illustrate sandwich estimation on UCI datasets. ... With UCI benchmark datasets, we compared the predictive performance of CHIVI to EP and Laplace. ... We apply Cox processes to model the spatial locations of shots (made and missed) from the 2015-2016 NBA season [35]. |
| Dataset Splits | Yes | We split all the datasets with 90% of the data for training and 10% for testing. ... The error rates for CHIVI correspond to the average of 10 error rates obtained by dividing the data into 10 folds, applying CHIVI to 9 folds to learn the variational parameters and performing prediction on the remainder. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, processor types, memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper mentions using Edward [33] but does not provide specific version numbers for Edward or any other software dependencies. |
| Experiment Setup | Yes | We used a minibatch size of 64 and 2000 iterations for each batch. ... The kernel hyperparameters were chosen using grid search. |