reproducibilityindex.ai

Robust Reinforcement Learning for Continuous Control with Model Misspecification

Authors: Daniel J. Mankowitz, Nir Levine, Rae Jeong, Abbas Abdolmaleki, Jost Tobias Springenberg, Yuanyuan Shi, Jackie Kay, Todd Hester, Timothy Mann, Martin Riedmiller

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We want to emphasize that, while the theoretical contributions are novel, our most signiﬁcant contribution is that of the extensive experimental analysis we have performed to analyze the robustness performance of our agent. Speciﬁcally: (3) We present experimental results in nine Mujoco domains showing that RE-MPO, SRE-MPO and R-MPO, SR-MPO outperform both E-MPO and MPO respectively.
Researcher Affiliation	Industry	Daniel J. Mankowitz , Nir Levine , Rae Jeong, Abbas Abdolmaleki, Jost Tobias Springenberg, Yuanyuan Shi , Jackie Kay, Timothy Mann, Todd Hester, Martin Riedmiller Deep Mind {dmankowitz, nirlevine, raejeong, aabdolmaleki, springenberg yyshi, kayj, timothymann, toddhester, riedmiller}@google.com
Pseudocode	Yes	The pseudo code for the R-MPO, RE-MPO and Soft-Robust Entropy-regularized MPO (SRE-MPO) algorithms can be found in Appendix I (Algorithms 1, 2 and 3 respectively).
Open Source Code	No	The paper does not provide a direct link or explicit statement about releasing the source code for the described methodology. The provided link is for performance videos.
Open Datasets	Yes	We now present experiments on nine different continuous control domains (...) from the Deep Mind control suite (Tassa et al., 2018).
Dataset Splits	Yes	Both the robust and non-robust agents are evaluated on a test set of three unseen task perturbations. (...) The chosen values of the uncertainty set and evaluation set for each domain can be found in Appendix H.3. Note that it is common practice to manually select the pre-deﬁned uncertainty set and the unseen test environments.
Hardware Specification	No	The paper discusses the simulated environments (e.g., Mujoco domains, Shadow hand) and general settings like 'training run consists of 30k episodes', but does not specify any actual hardware (GPU/CPU models, memory) used to run these simulations or training.
Software Dependencies	No	The paper provides hyperparameter tables (Table 1 and Table 2) that list various settings for the models and training, but it does not specify software dependencies with version numbers (e.g., Python, TensorFlow/PyTorch versions, specific library versions).
Experiment Setup	Yes	Each training run consists of 30k episodes and the experiments are repeated 5 times. (...) Tables 2 and 1 show the hyperparameters used for the MPO and SVG algorithms. All experiments use a feed-forward two layer neural network with 50 neurons to map the current state of the network to the mean and diagonal covariance of the Gaussian policy.