Action Robust Reinforcement Learning and Applications in Continuous Control

Authors: Chen Tessler, Yonathan Efroni, Shie Mannor

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We extend our approach to Deep RL, perform extensive evaluation across several Mu Jo Co (Todorov et al., 2012) environments and show the ability of our approach to produce robust policies. We empirically analyze the differences between the PR-MDP and NR-MDP approaches, and demonstrate their ability to produce robust policies under abrupt perturbations and mass uncertainty. Surprisingly, we observe that even in the absence of perturbations, solving for the action robust criteria results in improved performance.
Researcher Affiliation Academia 1Department of Electrical Engineering, Technion Institute of Technology, Haifa, Israel.
Pseudocode Yes Algorithm 1 Probabilistic Robust PI, Algorithm 2 Soft Probabilistic Robust PI
Open Source Code Yes Our code can be found in the following repository: https://github.com/tesslerc/Action Robust RL
Open Datasets Yes We consider several Mu Jo Co domains (Todorov et al., 2012).
Dataset Splits No The paper mentions evaluating trained policies on "unseen domains" or "a range of mass values not encountered during training," and training on "5 random seeds" with evaluation over "100 episodes," but does not specify explicit train/validation/test dataset splits with percentages or counts.
Hardware Specification No The paper does not provide any specific details about the hardware used for running experiments (e.g., GPU/CPU models, memory specifications).
Software Dependencies No The paper mentions software like MuJoCo, but does not provide specific version numbers for any software dependencies.
Experiment Setup Yes Table 1. Hyper-parameters considered. α values 0.01, 0.05, 0.1, 0.15 and 0.2 Actor update steps N 2, 5, 10 and 20. Specifically, for the PR-MDP we decided to use Parameter space noise with α = 0.1 and a ratio of 10:1.