On the Robustness of Safe Reinforcement Learning under Observational Perturbations

Authors: Zuxin Liu, Zijian Guo, Zhepeng Cen, Huan Zhang, Jie Tan, Bo Li, Ding Zhao

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We further propose a robust training framework for safe RL and evaluate it via comprehensive experiments. This paper provides a pioneer work to investigate the safety and robustness of RL under observational attacks for future safe RL studies. Code is available at: https://github.com/liuzuxin/ safe-rl-robustness
Researcher Affiliation Collaboration Zuxin Liu1, Zijian Guo1, Zhepeng Cen1, Huan Zhang1, Jie Tan2, Bo Li3, Ding Zhao1 1CMU, 2 Google Brain, 3 UIUC {zuxinl, zijiang, zcen, huanzhan}@andrew.cmu.edu jietan@google.com, lbo@illinois.edu, dingzhao@cmu.edu
Pseudocode Yes Algorithm 1 Adversarial safe RL training meta algorithm
Open Source Code Yes Code is available at: https://github.com/liuzuxin/ safe-rl-robustness
Open Datasets Yes The simulation environments are from a public available benchmark (Gronauer, 2022).
Dataset Splits No No specific dataset split information (exact percentages, sample counts, citations to predefined splits, or detailed splitting methodology) was found for training, validation, or test sets.
Hardware Specification Yes All the experiments are performed on a server with AMD EPYC 7713 64-Core Processor CPU. For each experiment, we use 4 CPUs to train each agent that is implemented by Py Torch, and the training time varies from 4 hours (Car-Run) to 7 days (Ant-Circle).
Software Dependencies No The paper mentions "implemented by Py Torch" but does not specify the version number for PyTorch or any other software libraries used. It also mentions "Bullet safety gym (Gronauer, 2022)" as the environment, but without version details.
Experiment Setup Yes The complete hyperparameters used in the experiments are shown in Table 2. We choose larger perturbation range for the Car robot-related tasks because they are simpler and easier to train.