reproducibilityindex.ai

Distributed Distributional Deterministic Policy Gradients

Authors: Gabriel Barth-Maron, Matthew W. Hoffman, David Budden, Will Dabney, Dan Horgan, Dhruva TB, Alistair Muldal, Nicolas Heess, Timothy Lillicrap

ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimentally we examine the contribution of each of these individual components, and show how they interact, as well as their combined contributions. Our results show that across a wide variety of simple control tasks, difﬁcult manipulation tasks, and a set of hard obstacle-based locomotion tasks the D4PG algorithm achieves state of the art performance.
Researcher Affiliation	Industry	Deep Mind London, UK {gabrielbm, mwhoffman, budden, wdabney, horgan, dhruvat, alimuldal, heess, countzero}@google.com
Pseudocode	Yes	Algorithm pseudocode for the D4PG algorithm which includes all the above-mentioned modiﬁcations can be found in Algorithm 1. Here the actor and critic parameters are updated using stochastic gradient descent with learning rates, αt and βt respectively, which are adjusted online using ADAM (Kingma & Ba, 2015). While this pseudocode focuses on the learning process, also shown is pseudocode for actor processes which in parallel ﬁll the replay table with data.
Open Source Code	No	The paper does not provide any explicit statement about releasing source code or a link to a code repository for the methodology described.
Open Datasets	Yes	We ﬁrst consider evaluating performance on a number of simple, physical control tasks by utilizing a suite of benchmark tasks (Tassa et al., 2018) developed in the Mu Jo Co physics simulator (Todorov et al., 2012).
Dataset Splits	No	The paper does not explicitly provide specific training/validation/test dataset splits (e.g., percentages, sample counts, or predefined split citations) needed to reproduce the experiment.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU/CPU models or memory used for running the experiments.
Software Dependencies	No	The paper mentions software like MuJoCo physics simulator, ApeX framework, and the ADAM optimizer, but does not provide specific version numbers for these components.
Experiment Setup	Yes	In all experiments we use a replay table of size R 1 ˆ 106 and only consider behavior policies which add ﬁxed Gaussian noise ϵNp0, 1q to the current online policy; in all experiments we use a value of ϵ 0.3. For all algorithms we initialize the learning rates for both actor and critic updates to the same value. In the next section we will present a suite of simple control problems for which this value corresponds to α0 β0 1ˆ10 4; for the following, harder problems we set this to a smaller value of α0 β0 5 ˆ 10 5. Similarly for the control suite we utilize a batch size of M 256 and for all subsequent problems we will increase this to M 512.