Marginalized Stochastic Natural Gradients for Black-Box Variational Inference

Authors: Geng Ji, Debora Sujono, Erik B Sudderth

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We integrate our method with the probabilistic programming language Pyro and evaluate real-world models of documents, images, networks, and crowd-sourcing.
Researcher Affiliation Collaboration 1Facebook AI 2Department of Computer Science, University of California, Irvine.
Pseudocode Yes Figure 3. Pyro implementation of MSNG. This inference code works for any discrete-variable model specified by valid Pyro model and guide functions. See the supplement for further details.
Open Source Code No The paper states 'Our Pyro-integrated MSNG code provides a compelling method for scalable black-box variational inference.' However, it does not explicitly state that this code is open-source, nor does it provide a link to a repository.
Open Datasets Yes Following Ji et al. (2019), we infer topic activations in a noisy-OR topic model of documents from the tiny version of the 20 newsgroups dataset collected by Sam Roweis. On the binarized MNIST training dataset, we learn the edge weights w of a three-layer fully-connected sigmoid belief network using the public data-augmented variational training code by Gan et al. (2015).
Dataset Splits No The paper mentions training and test sets but does not specify details for a validation split needed for reproduction.
Hardware Specification No The paper does not provide specific details about the hardware (e.g., CPU, GPU models, or memory) used for running the experiments.
Software Dependencies No The paper mentions software like 'Pyro' and 'torch' in code examples and discusses other PPLs (Edward, Tensor Flow Probability, Web PPL, Gen, Stan), but it does not provide specific version numbers for these software components.
Experiment Setup Yes For each method, we evaluate sample sizes of M {1, 10, 100}, and then report results for the variant that converges with lowest runtime. This leads to a sample size of M = 1 for MSNG; 100 for SNG and REINFORCE; and 10 (noisy-OR, sigmoid, Countries probit) or 100 (all else) for SNG+CV, REINFORCE+CV, and CONCRETE.