On the Robustness of Safe Reinforcement Learning under Observational Perturbations
Authors: Zuxin Liu, Zijian Guo, Zhepeng Cen, Huan Zhang, Jie Tan, Bo Li, Ding Zhao
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We further propose a robust training framework for safe RL and evaluate it via comprehensive experiments. This paper provides a pioneer work to investigate the safety and robustness of RL under observational attacks for future safe RL studies. Code is available at: https://github.com/liuzuxin/ safe-rl-robustness |
| Researcher Affiliation | Collaboration | Zuxin Liu1, Zijian Guo1, Zhepeng Cen1, Huan Zhang1, Jie Tan2, Bo Li3, Ding Zhao1 1CMU, 2 Google Brain, 3 UIUC {zuxinl, zijiang, zcen, huanzhan}@andrew.cmu.edu jietan@google.com, lbo@illinois.edu, dingzhao@cmu.edu |
| Pseudocode | Yes | Algorithm 1 Adversarial safe RL training meta algorithm |
| Open Source Code | Yes | Code is available at: https://github.com/liuzuxin/ safe-rl-robustness |
| Open Datasets | Yes | The simulation environments are from a public available benchmark (Gronauer, 2022). |
| Dataset Splits | No | No specific dataset split information (exact percentages, sample counts, citations to predefined splits, or detailed splitting methodology) was found for training, validation, or test sets. |
| Hardware Specification | Yes | All the experiments are performed on a server with AMD EPYC 7713 64-Core Processor CPU. For each experiment, we use 4 CPUs to train each agent that is implemented by Py Torch, and the training time varies from 4 hours (Car-Run) to 7 days (Ant-Circle). |
| Software Dependencies | No | The paper mentions "implemented by Py Torch" but does not specify the version number for PyTorch or any other software libraries used. It also mentions "Bullet safety gym (Gronauer, 2022)" as the environment, but without version details. |
| Experiment Setup | Yes | The complete hyperparameters used in the experiments are shown in Table 2. We choose larger perturbation range for the Car robot-related tasks because they are simpler and easier to train. |