Iterative Reachability Estimation for Safe Reinforcement Learning

Authors: Milan Ganai, Zheng Gong, Chenning Yu, Sylvia Herbert, Sicun Gao

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate the proposed methods on a diverse suite of safe RL environments from Safety Gym, Py Bullet, and Mu Jo Co, and show the benefits in improving both reward performance and safety compared with state-of-the-art baselines.
Researcher Affiliation Academia Milan Ganai UC San Diego mganai@ucsd.edu Zheng Gong UC San Diego zhgong@ucsd.edu Chenning Yu UC San Diego chy010@ucsd.edu Sylvia Herbert UC San Diego sherbert@ucsd.edu Sicun Gao UC San Diego sicung@ucsd.edu
Pseudocode Yes Algorithm 1 RESPO Actor Critic
Open Source Code Yes To ensure a fair comparison, the primal-dual based approaches and unconstrained Vanilla PPO were implemented based off of the same code base [59].
Open Datasets No The paper mentions evaluating on 'Safety Gym [30]', 'Safety Py Bullet [50]', and 'Safety Mu Jo Co [51]' environments. While these are widely used, the paper cites the frameworks/engines themselves and does not provide specific access information (links, DOIs, or formal citations for the *datasets* used within these simulation environments, if applicable) nor does it claim they are publicly available datasets. These are simulation environments rather than static datasets.
Dataset Splits Yes Total Env Interactions 9e6, Number Seeds per algorithm per experiment 5.
Hardware Specification Yes We run our experiments on Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz with 6 cores.
Software Dependencies No The paper mentions implementing approaches 'based off of the same code base [59]' (PPO Lagrangian Pytorch) and '[60]' (Omnisafe). However, it does not explicitly list specific version numbers for software dependencies such as Python, PyTorch, or other libraries used for the experiments, which are necessary for reproducible descriptions.
Experiment Setup Yes Table 2: Hyperparameter Settings Details