The Variational Predictive Natural Gradient

Authors: Da Tang, Rajesh Ranganath

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show our approach outperforms vanilla gradient optimization and the traditional natural gradient optimization on several latent variable models, including Bayesian logistic regression on synthetic data, variational autoencoders (Kingma & Welling, 2014; Rezende et al., 2014) on images, and variational probabilistic matrix factorization (Mnih & Salakhutdinov, 2008; Gopalan et al., 2015; Liang et al., 2016) on movie recommendation data. 5. Experiments We explore the empirical performance of variational inference using the VPNG updates in Algorithm 12. We consider Bayesian Logistic regression on a synthetic dataset, the VAE on a real handwritten digit dataset, and variational matrix factorization on a real movie recommendation dataset.
Researcher Affiliation Academia 1Department of Computer Science, Columbia University, New York, New York, USA 2The Courant Institute, New York University, New York, New York, USA.
Pseudocode Yes Algorithm 1 Variational inference with VPNGs
Open Source Code Yes Code is available at: https://github.com/datang1992/VPNG.
Open Datasets Yes We also study VPNGs for variational autoencoders (VAEs) (Kingma & Welling, 2014; Rezende et al., 2014) on binarized MNIST (Le Cun et al., 1998). (Page 5); Our third experiment is on Movie Lens 20M (Harper & Konstan, 2016). (Page 6)
Dataset Splits No We generate 500 samples and select a fixed set which contains 80% of the whole data for training and use the rest for testing. (Page 5); MNIST contains 70,000 images (60,000 for training and 10,000 for testing) (Page 5); We randomly split the data matrix R into train and test sets where the train set contains 90% of the rows of R (it contains ratings from 90% of the users) and the test set contains the remaining rows. (Page 6). No explicit validation split information is provided.
Hardware Specification No The paper does not provide specific hardware details such as exact GPU/CPU models, processor types, or memory amounts used for running experiments.
Software Dependencies No The paper mentions using RMSProp (Tieleman & Hinton, 2012) and Adam (Kingma & Ba, 2014) as learning rate adjustment techniques but does not specify software versions for these or any other key software components.
Experiment Setup Yes We use ten Monte Carlo samples to estimate the ELBO, its derivatives, and the variational predictive Fisher information matrix Fr. (Page 4); We use a 100-dimensional latent representation zi. (Page 5); We use 200 hidden units for both the inference and generative networks. (Page 5); We select a batch size of 600 (Page 5); Here d = 100 is the latent variable dimensionality. ... We use 300 hidden units for this experiment. (Page 6); Since this dataset is larger, we use a batch size of 3000. (Page 6)