Marginalized Stochastic Natural Gradients for Black-Box Variational Inference
Authors: Geng Ji, Debora Sujono, Erik B Sudderth
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We integrate our method with the probabilistic programming language Pyro and evaluate real-world models of documents, images, networks, and crowd-sourcing. |
| Researcher Affiliation | Collaboration | 1Facebook AI 2Department of Computer Science, University of California, Irvine. |
| Pseudocode | Yes | Figure 3. Pyro implementation of MSNG. This inference code works for any discrete-variable model speciļ¬ed by valid Pyro model and guide functions. See the supplement for further details. |
| Open Source Code | No | The paper states 'Our Pyro-integrated MSNG code provides a compelling method for scalable black-box variational inference.' However, it does not explicitly state that this code is open-source, nor does it provide a link to a repository. |
| Open Datasets | Yes | Following Ji et al. (2019), we infer topic activations in a noisy-OR topic model of documents from the tiny version of the 20 newsgroups dataset collected by Sam Roweis. On the binarized MNIST training dataset, we learn the edge weights w of a three-layer fully-connected sigmoid belief network using the public data-augmented variational training code by Gan et al. (2015). |
| Dataset Splits | No | The paper mentions training and test sets but does not specify details for a validation split needed for reproduction. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., CPU, GPU models, or memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions software like 'Pyro' and 'torch' in code examples and discusses other PPLs (Edward, Tensor Flow Probability, Web PPL, Gen, Stan), but it does not provide specific version numbers for these software components. |
| Experiment Setup | Yes | For each method, we evaluate sample sizes of M {1, 10, 100}, and then report results for the variant that converges with lowest runtime. This leads to a sample size of M = 1 for MSNG; 100 for SNG and REINFORCE; and 10 (noisy-OR, sigmoid, Countries probit) or 100 (all else) for SNG+CV, REINFORCE+CV, and CONCRETE. |