Action Robust Reinforcement Learning and Applications in Continuous Control
Authors: Chen Tessler, Yonathan Efroni, Shie Mannor
ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We extend our approach to Deep RL, perform extensive evaluation across several Mu Jo Co (Todorov et al., 2012) environments and show the ability of our approach to produce robust policies. We empirically analyze the differences between the PR-MDP and NR-MDP approaches, and demonstrate their ability to produce robust policies under abrupt perturbations and mass uncertainty. Surprisingly, we observe that even in the absence of perturbations, solving for the action robust criteria results in improved performance. |
| Researcher Affiliation | Academia | 1Department of Electrical Engineering, Technion Institute of Technology, Haifa, Israel. |
| Pseudocode | Yes | Algorithm 1 Probabilistic Robust PI, Algorithm 2 Soft Probabilistic Robust PI |
| Open Source Code | Yes | Our code can be found in the following repository: https://github.com/tesslerc/Action Robust RL |
| Open Datasets | Yes | We consider several Mu Jo Co domains (Todorov et al., 2012). |
| Dataset Splits | No | The paper mentions evaluating trained policies on "unseen domains" or "a range of mass values not encountered during training," and training on "5 random seeds" with evaluation over "100 episodes," but does not specify explicit train/validation/test dataset splits with percentages or counts. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used for running experiments (e.g., GPU/CPU models, memory specifications). |
| Software Dependencies | No | The paper mentions software like MuJoCo, but does not provide specific version numbers for any software dependencies. |
| Experiment Setup | Yes | Table 1. Hyper-parameters considered. α values 0.01, 0.05, 0.1, 0.15 and 0.2 Actor update steps N 2, 5, 10 and 20. Specifically, for the PR-MDP we decided to use Parameter space noise with α = 0.1 and a ratio of 10:1. |