Provably Scalable Black-Box Variational Inference with Structured Variational Families

Authors: Joohwan Ko, Kyurae Kim, Woo Chang Kim, Jacob R. Gardner

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically verify our theoretical results on large-scale hierarchical models. 4. Experiments We now empirically evaluate our theoretical analysis in Section 3. Mainly, we will compare the scalability of meanfield, full-rank, and the structured variational family described in Section 3.3.
Researcher Affiliation Academia Joohwan Ko * 1 Kyurae Kim * 2 Woo Chang Kim 1 Jacob R. Gardner 2 1KAIST, Daejeon, South Korea, Republic of 2Department of Computer and Information Sciences, University of Pennsylvania, Philadelphia, PA, U.S.A.
Pseudocode No The paper describes algorithms such as Stochastic Proximal Gradient Descent, but it does so in prose and mathematical notation rather than structured pseudocode or algorithm blocks.
Open Source Code No The paper mentions using specific software libraries and ecosystems for implementation (e.g., "Turing ecosystem", "CUDA.jl", "Zygote.jl"), but it does not provide an explicit statement or link to the authors' own source code for the methodology described in the paper.
Open Datasets Yes We use the rwm5yr German health registry doctor visit dataset (Hilbe, 2011) from the COUNT package in R (Hilbe, 2016). For the dataset, we take Crit Lang Acq from Wu et al. (2020). For the datasets, we use the exchange rate ( FX ) between 6 international currencies and the U.S. dollar.
Dataset Splits No The paper states: "To evaluate the effect of dataset size, we use subsets of the full datasets, as shown in Table 1." and describes how performance metrics are estimated: "We then estimate the minimum number of iterations 𝑇required to hit πœ–accuracy such that π‘Ÿπ‘‡ 1 > πœ–and π‘Ÿπ‘‡ πœ–. We set πœ–= 1 in all cases." However, it does not explicitly provide details about training, validation, or test dataset splits (e.g., percentages, absolute counts, or predefined splits from citations).
Hardware Specification Yes Table 2: Computational Resources... Processor 1 Intel i9-11900F, 2.5 GHz (maximum 5.2 GHz) per socket... Accelerator 1 NVIDIA GeForce RTX 3090 per node, 1.7 GHZ, 24Gi B RAM
Software Dependencies No The paper mentions software used: "implemented our experiments using the Turing ecosystem (Ge et al., 2018) in the Julia language (Bezanson et al., 2017). The structured covariances were implemented using the compressed sparse column (CSC) sparse matrix interface provided by the CUDA.jl library (Besard et al., 2019), while the sparse derivatives were implemented using the Zygote.jl framework (Innes, 2018)." However, specific version numbers for these software components are not provided.
Experiment Setup Yes To quantitatively verify the theoretical results in Section 3, we use proximal SGD with the proximal operator described in Appendix B.4.1 to match the theory. For the target distribution, we use an isotropic Gaussian target distribution... All variational families are initialized with a standard Gaussian. We then run BBVI with 𝑀= 8 Monte Carlo stepsizes, 50 different stepsizes in the interval of [10 6, 1], and estimate the sequence of expected distance to the optimum... We set πœ–= 1 in all cases. We use 8 Monte Carlo samples and the Adam optimizer (Kingma & Ba, 2015) for all problems, while the reported ELBOs are estimated using 1024 Monte Carlo samples every 100 iterations. The variational families are Gaussian such that πœ‘= 𝒩(0, 1).