Boosting Black Box Variational Inference

Authors: Francesco Locatello, Gideon Dresdner, Rajiv Khanna, Isabel Valera, Gunnar Raetsch

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 5 Experimental evaluation Notably, our VI algorithm is black box in the sense that it leaves the definition of the model and the choice of variational family up to the user. Therefore, we are able to reuse the same boosting black box VI solver to run all our experiments, and more generally, any probabilistic model and choice of variational family.
Researcher Affiliation Academia Francesco Locatello 1,2, Gideon Dresdner 2, Rajiv Khanna3, Isabel Valera1, and Gunnar Rätsch2 1Max-Planck Institute for Intelligent Systems, Germany 2Dept. for Computer Science, ETH Zurich, Universitätsstrasse 6, 8092 Zurich, Switzerland. 3The University of Texas at Austin, USA
Pseudocode Yes Algorithm 1 Affine Invariant Frank-Wolfe 1: init q0 2 conv(A), S := {q0}, and accuracy δ > 0 2: for t = 0 . . . T 3: Find st := (Approx-)LMOA(rf(qt)) 4: Variant 0: γ = 2 δt+2 5: Variant 1: γ = arg minγ2[0,1] f((1 γ)qt + γst) 6: qt+1 := (1 γ)qt + γst 7: Variant 2: S = S [ st 8: qt+1 = arg minq2conv(S) f(q) 9: end for
Open Source Code Yes Code to reproduce the experiments is available at: https://github.com/ ratschlab/boosting-bbvi.
Open Datasets Yes For the mortality prediction task, we used a preprocessed dataset created by the authors of [3] from the EICU COLLABORATIVE RESEARCH database [4]. ... We use the CBCL FACE1 dataset which is composed of 2,492 images of 361 pixels each, arranged into a matrix. ... 1http://cbcl.mit.edu/software-datasets/Face Data2.html
Dataset Splits No The paper mentions 'We performed a 70-30% train-test split' for the EICU dataset and 'We use the training log-likelihood to select the best iteration', but does not explicitly describe a validation dataset split or strategy like k-fold cross-validation.
Hardware Specification No The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies No The paper mentions 'Edward probabilistic programming framework [25]' but does not specify a version number or other software dependencies with their versions.
Experiment Setup Yes We run these baseline VI experiments for 10,000 iterations which is orders of magnitude more than what is required for convergence. Unless otherwise noted, we use Gaussians as our base family. ... We found that λ = 1 pt+1 worked well in all the experiments. ... For this experiment, we ran our algorithm for 35 iterations and found that iteration 17 had the best performance. ... We ran our algorithm for 29 iterations and again found that iteration 17 had the best performance.