GO Gradient for Expectation-Based Objectives

Authors: Yulai Cong, Miaoyun Zhao, Ke Bai, Lawrence Carin

ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We examine the proposed GO gradients and statistical back-propagation with four experiments: (i) simple one-dimensional (gamma and negative binomial) examples are presented to verify the GO gradient in Theorem 1, corresponding to nonnegative and discrete random variables; (ii) the discrete variational autoencoder experiment from Tucker et al. (2017) and Grathwohl et al. (2017) is reproduced to compare GO with the state-of-the-art variance-reduction methods; (iii) a multinomial GAN, generating discrete observations, is constructed to demonstrate the deep GO gradient in Theorem 2; (iv) hierarchical variational inference (HVI) for two deep non-conjugate Bayesian models is developed to verify statistical back-propagation in Theorem 3.
Researcher Affiliation Academia Yulai Cong Miaoyun Zhao Ke Bai Lawrence Carin Department of Electrical and Computer Engineering, Duke University
Pseudocode Yes Algorithm 1 An algorithm for (34) as an example to demonstrate how to practically cooperate GO gradients with deep learning frameworks like Tensor Flow or Py Torch.
Open Source Code Yes Code for all experiments can be found at github.com/Yulai Cong/GOgradient.
Open Datasets Yes Dataset Model Training Validation MNIST...Omniglot...
Dataset Splits Yes Table 1: Best obtained ELBOs for discrete variational autoencoders. Results of REBAR and RELAX are obtained by running the released code4 from Grathwohl et al. (2017). All methods are run with the same learning rate for 1, 000, 000 iterations. Dataset Model Training Validation... ELBOs are calculated using all training/validation data.
Hardware Specification Yes Experiments presented below were implemented in Tensor Flow or Py Torch with a Titan Xp GPU.
Software Dependencies No The paper mentions 'Tensor Flow (Abadi et al.)' and 'Py Torch (Paszke et al., 2017)' as frameworks used, but it does not specify exact version numbers for these or any other software dependencies, which is required for reproducibility.
Experiment Setup Yes Stochastic gradient ascent with one-sample-estimated gradients is used to optimize the objective...All methods are run with the same learning rate for 1, 000, 000 iterations...The mini-batch size is set to 200. One-sample gradient estimates are used to train the model for the compared methods. For RSVI (Naesseth et al., 2016), the shape augmentation parameter B is set to 5...Let z(l) Tz, where Tz = 1e 5 is used in the experiments; Let c(l) Tc, where Tc = 1e 5; Let Φ(l+1)z(l+1) Tα with Tα = 0.2; Use a factor to compromise the likelihood and prior for each z(l).