Forward $\chi^2$ Divergence Based Variational Importance Sampling
Authors: Chengrui Li, Yule Wang, Weihan Li, Anqi Wu
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We apply VIS to various popular latent variable models, including mixture models, variational auto-encoders, and partially observable generalized linear models. Results demonstrate that our approach consistently outperforms state-of-the-art baselines, in terms of both log-likelihood and model parameter estimation. |
| Researcher Affiliation | Academia | School of Computational Science & Engineering Georgia Institute of Technology, Atlanta, GA 30305, USA |
| Pseudocode | Yes | Algorithm 1 VIS 1: for i = 1:N do 2: Sample z(k) K k=1 from q(z|x; ϕ). 3: Update θ by maximizing ln ˆp(x; θ, ϕ) via Eq. 6. 4: Update ϕ by minimizing χ2(p(z|x; θ) q(z|x; ϕ)) via Eq. 12 or Eq. 24. 5: end for |
| Open Source Code | Yes | Code: https://github.com/Jerry Soybean/vis. |
| Open Datasets | Yes | We apply the VAE model on the MMIST dataset (Le Cun et al., 1998). We run different methods on a real neural spike train recorded from V = 27 retinal ganglion neurons while a mouse is performing a visual test for about 20 mins (Pillow & Scott, 2012). |
| Dataset Splits | No | The paper specifies train and test sets but does not explicitly mention a separate validation set for any of the experiments. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU models, CPU types) used to run the experiments. |
| Software Dependencies | No | The paper mentions using "Adam (Kingma & Ba, 2014)" as an optimizer, but it does not specify any software libraries or dependencies with version numbers (e.g., PyTorch 1.9, TensorFlow 2.x, Python 3.8). |
| Experiment Setup | Yes | We use Adam (Kingma & Ba, 2014) as the optimizer and the learning rate is set at 0.002. We run 200 epochs for each method, and in each epoch, 100 batches of size 10 are used for optimization. The number of Monte Carlo samples used for sampling the latent is K = 5000. |