Iterative Reachability Estimation for Safe Reinforcement Learning
Authors: Milan Ganai, Zheng Gong, Chenning Yu, Sylvia Herbert, Sicun Gao
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate the proposed methods on a diverse suite of safe RL environments from Safety Gym, Py Bullet, and Mu Jo Co, and show the benefits in improving both reward performance and safety compared with state-of-the-art baselines. |
| Researcher Affiliation | Academia | Milan Ganai UC San Diego mganai@ucsd.edu Zheng Gong UC San Diego zhgong@ucsd.edu Chenning Yu UC San Diego chy010@ucsd.edu Sylvia Herbert UC San Diego sherbert@ucsd.edu Sicun Gao UC San Diego sicung@ucsd.edu |
| Pseudocode | Yes | Algorithm 1 RESPO Actor Critic |
| Open Source Code | Yes | To ensure a fair comparison, the primal-dual based approaches and unconstrained Vanilla PPO were implemented based off of the same code base [59]. |
| Open Datasets | No | The paper mentions evaluating on 'Safety Gym [30]', 'Safety Py Bullet [50]', and 'Safety Mu Jo Co [51]' environments. While these are widely used, the paper cites the frameworks/engines themselves and does not provide specific access information (links, DOIs, or formal citations for the *datasets* used within these simulation environments, if applicable) nor does it claim they are publicly available datasets. These are simulation environments rather than static datasets. |
| Dataset Splits | Yes | Total Env Interactions 9e6, Number Seeds per algorithm per experiment 5. |
| Hardware Specification | Yes | We run our experiments on Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz with 6 cores. |
| Software Dependencies | No | The paper mentions implementing approaches 'based off of the same code base [59]' (PPO Lagrangian Pytorch) and '[60]' (Omnisafe). However, it does not explicitly list specific version numbers for software dependencies such as Python, PyTorch, or other libraries used for the experiments, which are necessary for reproducible descriptions. |
| Experiment Setup | Yes | Table 2: Hyperparameter Settings Details |