reproducibilityindex.ai

Action Robust Reinforcement Learning and Applications in Continuous Control

Authors: Chen Tessler, Yonathan Efroni, Shie Mannor

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We extend our approach to Deep RL, perform extensive evaluation across several Mu Jo Co (Todorov et al., 2012) environments and show the ability of our approach to produce robust policies. We empirically analyze the differences between the PR-MDP and NR-MDP approaches, and demonstrate their ability to produce robust policies under abrupt perturbations and mass uncertainty. Surprisingly, we observe that even in the absence of perturbations, solving for the action robust criteria results in improved performance.
Researcher Affiliation	Academia	1Department of Electrical Engineering, Technion Institute of Technology, Haifa, Israel.
Pseudocode	Yes	Algorithm 1 Probabilistic Robust PI, Algorithm 2 Soft Probabilistic Robust PI
Open Source Code	Yes	Our code can be found in the following repository: https://github.com/tesslerc/Action Robust RL
Open Datasets	Yes	We consider several Mu Jo Co domains (Todorov et al., 2012).
Dataset Splits	No	The paper mentions evaluating trained policies on "unseen domains" or "a range of mass values not encountered during training," and training on "5 random seeds" with evaluation over "100 episodes," but does not specify explicit train/validation/test dataset splits with percentages or counts.
Hardware Specification	No	The paper does not provide any specific details about the hardware used for running experiments (e.g., GPU/CPU models, memory specifications).
Software Dependencies	No	The paper mentions software like MuJoCo, but does not provide specific version numbers for any software dependencies.
Experiment Setup	Yes	Table 1. Hyper-parameters considered. α values 0.01, 0.05, 0.1, 0.15 and 0.2 Actor update steps N 2, 5, 10 and 20. Speciﬁcally, for the PR-MDP we decided to use Parameter space noise with α = 0.1 and a ratio of 10:1.