reproducibilityindex.ai

Expected Policy Gradients

Authors: Kamil Ciosek, Shimon Whiteson

AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We present empirical results conﬁrming that this new approach to exploration substantially outperforms DPG with Ornstein-Uhlenbeck exploration in four challenging Mu Jo Co domains. Experiments While EPG has many potential uses, we focus on empirically evaluating one particular application: exploration driven by the Hessian exponential (as introduced in Algorithm 2 and Lemma 2), replacing the standard Ornstein-Uhlenbeck (OU) exploration in continuous action domains.
Researcher Affiliation	Academia	Kamil Ciosek, Shimon Whiteson Department of Computer Science, University of Oxford Wolfson Building, Parks Road, Oxford OX1 3QD {kamil.ciosek,shimon.whiteson}@cs.ox.ac.uk
Pseudocode	Yes	Algorithm 1 Expected Policy Gradients; Algorithm 2 Gaussian Policy Gradients; Algorithm 3 Gaussian Integrals
Open Source Code	No	The paper does not provide an explicit statement or link to the open-source code for the methodology described.
Open Datasets	Yes	To this end, we applied EPG to four domains modelled with the Mu Jo Co physics simulator (Todorov, Erez, and Tassa 2012): Half Cheetah-v1, Inverted Pendulum-v1, Reacher2d-v1 and Walker2d-v1
Dataset Splits	No	The paper uses continuous control environments and does not specify explicit training, validation, or test dataset splits in terms of percentages or counts, as it generates data dynamically through interaction with the environment.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments.
Software Dependencies	No	The paper mentions software like the 'Mu Jo Co physics simulator' and 'OpenAI Gym' but does not provide specific version numbers for any software dependencies.
Experiment Setup	Yes	The exploration hyperparameters for EPG were σ0 = 0.2 and c = 1.0 where the exploration covariance is σ0ec H. These values were obtained using a grid search from the set {0.2, 0.5, 1} for σ0 and {0.5, 1.0, 2.0} for c over the Half Cheetah-v1 domain. ... For SPG5, we used OU exploration and a constant diagonal covariance of 0.2 in the actor update (this approximately corresponds to the average variance of the OU process over time).