Backpropagation through the Void: Optimizing control variates for black-box gradient estimation

Authors: Will Grathwohl, Dami Choi, Yuhuai Wu, Geoff Roeder, David Duvenaud

ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate the effectiveness of our estimator on a number of challenging optimization problems. Following Tucker et al. (2017) we begin with a simple toy example to illuminate the potential of our method and then continue to the more relevant problems of optimizing binary VAE s and reinforcement learning. In all tested environments we observe improved performance and sample efficiency using our method. The results of our experiments can be seen in Figure 5, and Table 2.
Researcher Affiliation Academia Will Grathwohl, Dami Choi, Yuhuai Wu, Geoffrey Roeder, David Duvenaud University of Toronto and Vector Institute {wgrathwohl, choidami, ywu, roeder, duvenaud}@cs.toronto.edu
Pseudocode Yes Algorithm 1: LAX: Optimizing parameters and a gradient control variate simultaneously. and Algorithm 2: RELAX: Low-variance control variate optimization for black-box gradient estimation.
Open Source Code Yes Tucker et al. (2018) pointed out a bug in our initially released code for the continuous RL experiments. This issue has been fixed in the publicly available code and the results presented in this paper were generated with the corrected code.
Open Datasets Yes on both the MNIST and Omniglot (Lake et al., 2015) datasets. We test our approach on the Cart Pole and Lunar Lander environments as provided by the Open AI gym (Brockman et al., 2016).
Dataset Splits No The paper uses validation data (e.g., 'Highest obtained validation ELBO' and 'RELAX train REBAR valid RELAX train RELAX valid' in Figure 4), but does not provide specific dataset split information (exact percentages, sample counts, or explicit splitting methodology) for reproduction.
Hardware Specification No The paper does not provide specific hardware details such as exact GPU/CPU models, processor types, or memory amounts used for running its experiments.
Software Dependencies No The paper mentions software like 'Tensorflow' and optimizers like 'RMSProp' and 'ADAM', but does not provide specific version numbers for these software components or any other ancillary libraries required for replication.
Experiment Setup Yes We run all models for 2,000,000 iterations with a batch size of 24. For the REBAR models, we tested learning rates in {.005, .001, .0005, .0001, .00005}. ... Both models were trained with the RMSProp (Tieleman & Hinton, 2012) optimizer and a reward discount factor of .99 was used. Entropy regularization with a weight of .01 was used to encourage exploration. All models were trained using ADAM (Kingma & Ba, 2015), with β1 = 0.9, β2 = 0.999, and ϵ = 1e 08.