Flexible mean field variational inference using mixtures of non-overlapping exponential families

Authors: Jeffrey Spence

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To show the applicability of the theoretical results presented in Section 3, I test sparse VI schemes using Theorems 1 and 2 which I refer to as the non-overlapping mixtures trick on two models and compare these schemes to non-sparse and naive VI approximations showing the superior performance of treating sparsity exactly. Python implementations of the simulations, fitting procedures, and plotting are available at https://github.com/jeffspence/non_overlapping_mixtures.
Researcher Affiliation Academia Jeffrey P. Spence Stanford University Stanford, CA 94305 jspence@stanford.edu
Pseudocode No The paper describes algorithms and update rules but does not present them in structured pseudocode or algorithm blocks.
Open Source Code Yes Python implementations of the simulations, fitting procedures, and plotting are available at https://github.com/jeffspence/non_overlapping_mixtures.
Open Datasets No I simulated data under the LDpred model to compare two VI schemes. For each dataset, I simulated 500 points with 10000 dimensions. The paper uses simulated data, not a publicly accessible dataset.
Dataset Splits No The paper describes simulating data and running experiments, but does not provide specific train/validation/test split information.
Hardware Specification Yes All simulations were run on a 2013 Mac Book Pro with an Intel i7-4770K CPU.
Software Dependencies No The paper mentions software packages like NIMBLE and pyro, and the adam optimizer, but does not provide specific version numbers for these dependencies.
Experiment Setup Yes I simulated a 1000 dimensional vector ˆβ from the LDpred model with p0 = 0.99 so that on average about 10 sites had non-zero effects... I set σ21 to be 1 and then varied σ2e from 0.05 to 1.0. For BBBVI, I used the equivalent formulation... I used 2 particles to stochastically estimate gradients, 50 components in the mixture distribution, and 2000 gradient steps per mixture component. The BBBVI objective functions were optimized using the adam optimizer [33] with learning rate 10 3 and default parameters otherwise. For inference I set σ21 = 0.5, σ2e = 1, and p0 = 1 100/10000. For all runs, I used K = 2 to project onto a two-dimensional space to facilitate visualization.