On the Convergence of Black-Box Variational Inference

Authors: Kyurae Kim, Jisu Oh, Kaiwen Wu, Yian Ma, Jacob Gardner

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate this theoretical insight by comparing proximal SGD against other standard implementations of BBVI on large-scale Bayesian inference problems. In Section 5, we evaluate the utility of proximal SGD on large-scale Bayesian inference problems.
Researcher Affiliation Academia Kyurae Kim University of Pennsylvania kyrkim@seas.upenn.edu Jisu Oh North Carolina State University joh26@ncsu.edu Kaiwen Wu University of Pennsylvania kaiwenwu@seas.upenn.edu Yi-An Ma University of California, San Diego yianma@ucsd.edu Jacob R. Gardner University of Pennsylvania jacobrg@seas.upenn.edu
Pseudocode Yes Algorithm 1: Prox Gen-Adam for Black-Box Variational Inference
Open Source Code No The paper does not provide an explicit statement or link to its own open-source code for the methodology described.
Open Datasets Yes LME-election Linear Mixed Effects 1988 U.S. presidential election (Gelman & Hill, 2007); KEGG-undirected (Shannon et al., 2003); million songs (Bertin-Mahieux et al., 2011); The dataset was obtained from Posterior DB (Magnusson et al., 2022).
Dataset Splits No The paper mentions batch sizes and Monte Carlo samples, but does not provide specific training/validation/test dataset splits (e.g., percentages or sample counts for each split).
Hardware Specification Yes Table 1: Computational Resources: System Topology 2 nodes with 2 sockets each with 24 logical threads (total 48 threads) Processor 1 Intel Xeon Silver 4310, 2.1 GHz (maximum 3.3 GHz) per socket Cache 1.1 Mi B L1, 30 Mi B L2, and 36 Mi B L3 Memory 250 Gi B RAM Accelerator 1 NVIDIA RTX A5000 per node, 2 GHZ, 24GB RAM
Software Dependencies No The paper mentions 'Turing (Ge et al., 2018)' and 'Adam (Kingma & Ba, 2015)' but does not provide specific version numbers for these or other software dependencies used in the experiments.
Experiment Setup Yes We run all algorithms with a fixed stepsize... We implement doubly stochastic subsampling (Titsias & Lázaro-Gredilla, 2014) with a batch size of B= 100 (B= 500 for BT-tennis) with M= 10 Monte Carlo samples. ... The results shown used a base stepsize of γ= 10^3, while the initial point was m0 = 0, C0 = I.