Flexible mean field variational inference using mixtures of non-overlapping exponential families
Authors: Jeffrey Spence
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To show the applicability of the theoretical results presented in Section 3, I test sparse VI schemes using Theorems 1 and 2 which I refer to as the non-overlapping mixtures trick on two models and compare these schemes to non-sparse and naive VI approximations showing the superior performance of treating sparsity exactly. Python implementations of the simulations, fitting procedures, and plotting are available at https://github.com/jeffspence/non_overlapping_mixtures. |
| Researcher Affiliation | Academia | Jeffrey P. Spence Stanford University Stanford, CA 94305 jspence@stanford.edu |
| Pseudocode | No | The paper describes algorithms and update rules but does not present them in structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Python implementations of the simulations, fitting procedures, and plotting are available at https://github.com/jeffspence/non_overlapping_mixtures. |
| Open Datasets | No | I simulated data under the LDpred model to compare two VI schemes. For each dataset, I simulated 500 points with 10000 dimensions. The paper uses simulated data, not a publicly accessible dataset. |
| Dataset Splits | No | The paper describes simulating data and running experiments, but does not provide specific train/validation/test split information. |
| Hardware Specification | Yes | All simulations were run on a 2013 Mac Book Pro with an Intel i7-4770K CPU. |
| Software Dependencies | No | The paper mentions software packages like NIMBLE and pyro, and the adam optimizer, but does not provide specific version numbers for these dependencies. |
| Experiment Setup | Yes | I simulated a 1000 dimensional vector ˆβ from the LDpred model with p0 = 0.99 so that on average about 10 sites had non-zero effects... I set σ21 to be 1 and then varied σ2e from 0.05 to 1.0. For BBBVI, I used the equivalent formulation... I used 2 particles to stochastically estimate gradients, 50 components in the mixture distribution, and 2000 gradient steps per mixture component. The BBBVI objective functions were optimized using the adam optimizer [33] with learning rate 10 3 and default parameters otherwise. For inference I set σ21 = 0.5, σ2e = 1, and p0 = 1 100/10000. For all runs, I used K = 2 to project onto a two-dimensional space to facilitate visualization. |