reproducibilityindex.ai

Backpropagation through the Void: Optimizing control variates for black-box gradient estimation

Authors: Will Grathwohl, Dami Choi, Yuhuai Wu, Geoff Roeder, David Duvenaud

ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate the effectiveness of our estimator on a number of challenging optimization problems. Following Tucker et al. (2017) we begin with a simple toy example to illuminate the potential of our method and then continue to the more relevant problems of optimizing binary VAE s and reinforcement learning. In all tested environments we observe improved performance and sample efﬁciency using our method. The results of our experiments can be seen in Figure 5, and Table 2.
Researcher Affiliation	Academia	Will Grathwohl, Dami Choi, Yuhuai Wu, Geoffrey Roeder, David Duvenaud University of Toronto and Vector Institute {wgrathwohl, choidami, ywu, roeder, duvenaud}@cs.toronto.edu
Pseudocode	Yes	Algorithm 1: LAX: Optimizing parameters and a gradient control variate simultaneously. and Algorithm 2: RELAX: Low-variance control variate optimization for black-box gradient estimation.
Open Source Code	Yes	Tucker et al. (2018) pointed out a bug in our initially released code for the continuous RL experiments. This issue has been ﬁxed in the publicly available code and the results presented in this paper were generated with the corrected code.
Open Datasets	Yes	on both the MNIST and Omniglot (Lake et al., 2015) datasets. We test our approach on the Cart Pole and Lunar Lander environments as provided by the Open AI gym (Brockman et al., 2016).
Dataset Splits	No	The paper uses validation data (e.g., 'Highest obtained validation ELBO' and 'RELAX train REBAR valid RELAX train RELAX valid' in Figure 4), but does not provide specific dataset split information (exact percentages, sample counts, or explicit splitting methodology) for reproduction.
Hardware Specification	No	The paper does not provide specific hardware details such as exact GPU/CPU models, processor types, or memory amounts used for running its experiments.
Software Dependencies	No	The paper mentions software like 'Tensorﬂow' and optimizers like 'RMSProp' and 'ADAM', but does not provide specific version numbers for these software components or any other ancillary libraries required for replication.
Experiment Setup	Yes	We run all models for 2,000,000 iterations with a batch size of 24. For the REBAR models, we tested learning rates in {.005, .001, .0005, .0001, .00005}. ... Both models were trained with the RMSProp (Tieleman & Hinton, 2012) optimizer and a reward discount factor of .99 was used. Entropy regularization with a weight of .01 was used to encourage exploration. All models were trained using ADAM (Kingma & Ba, 2015), with β1 = 0.9, β2 = 0.999, and ϵ = 1e 08.