Provably Scalable Black-Box Variational Inference with Structured Variational Families
Authors: Joohwan Ko, Kyurae Kim, Woo Chang Kim, Jacob R. Gardner
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically verify our theoretical results on large-scale hierarchical models. 4. Experiments We now empirically evaluate our theoretical analysis in Section 3. Mainly, we will compare the scalability of meanfield, full-rank, and the structured variational family described in Section 3.3. |
| Researcher Affiliation | Academia | Joohwan Ko * 1 Kyurae Kim * 2 Woo Chang Kim 1 Jacob R. Gardner 2 1KAIST, Daejeon, South Korea, Republic of 2Department of Computer and Information Sciences, University of Pennsylvania, Philadelphia, PA, U.S.A. |
| Pseudocode | No | The paper describes algorithms such as Stochastic Proximal Gradient Descent, but it does so in prose and mathematical notation rather than structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper mentions using specific software libraries and ecosystems for implementation (e.g., "Turing ecosystem", "CUDA.jl", "Zygote.jl"), but it does not provide an explicit statement or link to the authors' own source code for the methodology described in the paper. |
| Open Datasets | Yes | We use the rwm5yr German health registry doctor visit dataset (Hilbe, 2011) from the COUNT package in R (Hilbe, 2016). For the dataset, we take Crit Lang Acq from Wu et al. (2020). For the datasets, we use the exchange rate ( FX ) between 6 international currencies and the U.S. dollar. |
| Dataset Splits | No | The paper states: "To evaluate the effect of dataset size, we use subsets of the full datasets, as shown in Table 1." and describes how performance metrics are estimated: "We then estimate the minimum number of iterations πrequired to hit πaccuracy such that ππ 1 > πand ππ π. We set π= 1 in all cases." However, it does not explicitly provide details about training, validation, or test dataset splits (e.g., percentages, absolute counts, or predefined splits from citations). |
| Hardware Specification | Yes | Table 2: Computational Resources... Processor 1 Intel i9-11900F, 2.5 GHz (maximum 5.2 GHz) per socket... Accelerator 1 NVIDIA GeForce RTX 3090 per node, 1.7 GHZ, 24Gi B RAM |
| Software Dependencies | No | The paper mentions software used: "implemented our experiments using the Turing ecosystem (Ge et al., 2018) in the Julia language (Bezanson et al., 2017). The structured covariances were implemented using the compressed sparse column (CSC) sparse matrix interface provided by the CUDA.jl library (Besard et al., 2019), while the sparse derivatives were implemented using the Zygote.jl framework (Innes, 2018)." However, specific version numbers for these software components are not provided. |
| Experiment Setup | Yes | To quantitatively verify the theoretical results in Section 3, we use proximal SGD with the proximal operator described in Appendix B.4.1 to match the theory. For the target distribution, we use an isotropic Gaussian target distribution... All variational families are initialized with a standard Gaussian. We then run BBVI with π= 8 Monte Carlo stepsizes, 50 different stepsizes in the interval of [10 6, 1], and estimate the sequence of expected distance to the optimum... We set π= 1 in all cases. We use 8 Monte Carlo samples and the Adam optimizer (Kingma & Ba, 2015) for all problems, while the reported ELBOs are estimated using 1024 Monte Carlo samples every 100 iterations. The variational families are Gaussian such that π= π©(0, 1). |