Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Expected Policy Gradients
Authors: Kamil Ciosek, Shimon Whiteson
AAAI 2018 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We present empirical results con๏ฌrming that this new approach to exploration substantially outperforms DPG with Ornstein-Uhlenbeck exploration in four challenging Mu Jo Co domains. Experiments While EPG has many potential uses, we focus on empirically evaluating one particular application: exploration driven by the Hessian exponential (as introduced in Algorithm 2 and Lemma 2), replacing the standard Ornstein-Uhlenbeck (OU) exploration in continuous action domains. |
| Researcher Affiliation | Academia | Kamil Ciosek, Shimon Whiteson Department of Computer Science, University of Oxford Wolfson Building, Parks Road, Oxford OX1 3QD EMAIL |
| Pseudocode | Yes | Algorithm 1 Expected Policy Gradients; Algorithm 2 Gaussian Policy Gradients; Algorithm 3 Gaussian Integrals |
| Open Source Code | No | The paper does not provide an explicit statement or link to the open-source code for the methodology described. |
| Open Datasets | Yes | To this end, we applied EPG to four domains modelled with the Mu Jo Co physics simulator (Todorov, Erez, and Tassa 2012): Half Cheetah-v1, Inverted Pendulum-v1, Reacher2d-v1 and Walker2d-v1 |
| Dataset Splits | No | The paper uses continuous control environments and does not specify explicit training, validation, or test dataset splits in terms of percentages or counts, as it generates data dynamically through interaction with the environment. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments. |
| Software Dependencies | No | The paper mentions software like the 'Mu Jo Co physics simulator' and 'OpenAI Gym' but does not provide specific version numbers for any software dependencies. |
| Experiment Setup | Yes | The exploration hyperparameters for EPG were ฯ0 = 0.2 and c = 1.0 where the exploration covariance is ฯ0ec H. These values were obtained using a grid search from the set {0.2, 0.5, 1} for ฯ0 and {0.5, 1.0, 2.0} for c over the Half Cheetah-v1 domain. ... For SPG5, we used OU exploration and a constant diagonal covariance of 0.2 in the actor update (this approximately corresponds to the average variance of the OU process over time). |