Evaluating the Variance of Likelihood-Ratio Gradient Estimators

Authors: Seiya Tokui, Issei Sato

ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct experiments to empirically verify Theorem 1 and to demonstrate a procedure to analyze the optimal degree of a given estimator covered by our framework. We use MNIST (Lecun et al., 1998) and Omniglot (Lake et al., 2015) for our experiments.
Researcher Affiliation Collaboration 1Preferred Networks, Tokyo, Japan 2The University of Tokyo, Tokyo, Japan 3RIKEN, Tokyo, Japan.
Pseudocode Yes Algorithm 1 Algorithm for RAM estimator (4) for discrete zi s. If zi is continuous, the loop over all the configurations of zi is replaced by a loop over integration points.
Open Source Code No The paper states 'All the methods are implemented with Chainer (Tokui et al., 2015)' but does not provide a link or explicit statement for the release of their own source code.
Open Datasets Yes We use MNIST (Lecun et al., 1998) and Omniglot (Lake et al., 2015) for our experiments.
Dataset Splits Yes For the MNIST dataset, we use the standard split of 60,000 training images and 10,000 test images. The training images are further split into 50,000 images and 10,000 images, the latter of which are used for validation. For the Omniglot dataset, we use the standard split of 24,345 training images and 8,070 test images used in the official implementation of Burda et al. (2015) 4. The training images are further split into 20,288 images and 4,057 images, the latter of which are used for validation.
Hardware Specification Yes Each experiment is done on an Intel(R) Xeon(R) CPU E52623 v3 at 3.00 GHz and an NVIDIA Ge Force Titan X.
Software Dependencies No The paper states 'All the methods are implemented with Chainer (Tokui et al., 2015)' but does not provide specific version numbers for Chainer or any other software dependencies.
Experiment Setup Yes We used RMSprop (Tieleman & Hinton, 2012) with a minibatch size of 100 to optimize the variational lower bound. We apply a weight decay of the coefficient 0.001 for all parameters. All the weights are initialized with the method of Glorot & Bengio (2010). The learning rate is chosen from {3 10 4, 10 3, 3 10 3}.