reproducibilityindex.ai

On the Robustness of Safe Reinforcement Learning under Observational Perturbations

Authors: Zuxin Liu, Zijian Guo, Zhepeng Cen, Huan Zhang, Jie Tan, Bo Li, Ding Zhao

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We further propose a robust training framework for safe RL and evaluate it via comprehensive experiments. This paper provides a pioneer work to investigate the safety and robustness of RL under observational attacks for future safe RL studies. Code is available at: https://github.com/liuzuxin/ safe-rl-robustness
Researcher Affiliation	Collaboration	Zuxin Liu1, Zijian Guo1, Zhepeng Cen1, Huan Zhang1, Jie Tan2, Bo Li3, Ding Zhao1 1CMU, 2 Google Brain, 3 UIUC {zuxinl, zijiang, zcen, huanzhan}@andrew.cmu.edu jietan@google.com, lbo@illinois.edu, dingzhao@cmu.edu
Pseudocode	Yes	Algorithm 1 Adversarial safe RL training meta algorithm
Open Source Code	Yes	Code is available at: https://github.com/liuzuxin/ safe-rl-robustness
Open Datasets	Yes	The simulation environments are from a public available benchmark (Gronauer, 2022).
Dataset Splits	No	No specific dataset split information (exact percentages, sample counts, citations to predefined splits, or detailed splitting methodology) was found for training, validation, or test sets.
Hardware Specification	Yes	All the experiments are performed on a server with AMD EPYC 7713 64-Core Processor CPU. For each experiment, we use 4 CPUs to train each agent that is implemented by Py Torch, and the training time varies from 4 hours (Car-Run) to 7 days (Ant-Circle).
Software Dependencies	No	The paper mentions "implemented by Py Torch" but does not specify the version number for PyTorch or any other software libraries used. It also mentions "Bullet safety gym (Gronauer, 2022)" as the environment, but without version details.
Experiment Setup	Yes	The complete hyperparameters used in the experiments are shown in Table 2. We choose larger perturbation range for the Car robot-related tasks because they are simpler and easier to train.