Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Iterative Reachability Estimation for Safe Reinforcement Learning
Authors: Milan Ganai, Zheng Gong, Chenning Yu, Sylvia Herbert, Sicun Gao
NeurIPS 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate the proposed methods on a diverse suite of safe RL environments from Safety Gym, Py Bullet, and Mu Jo Co, and show the benefits in improving both reward performance and safety compared with state-of-the-art baselines. |
| Researcher Affiliation | Academia | Milan Ganai UC San Diego EMAIL Zheng Gong UC San Diego EMAIL Chenning Yu UC San Diego EMAIL Sylvia Herbert UC San Diego EMAIL Sicun Gao UC San Diego EMAIL |
| Pseudocode | Yes | Algorithm 1 RESPO Actor Critic |
| Open Source Code | Yes | To ensure a fair comparison, the primal-dual based approaches and unconstrained Vanilla PPO were implemented based off of the same code base [59]. |
| Open Datasets | No | The paper mentions evaluating on 'Safety Gym [30]', 'Safety Py Bullet [50]', and 'Safety Mu Jo Co [51]' environments. While these are widely used, the paper cites the frameworks/engines themselves and does not provide specific access information (links, DOIs, or formal citations for the *datasets* used within these simulation environments, if applicable) nor does it claim they are publicly available datasets. These are simulation environments rather than static datasets. |
| Dataset Splits | Yes | Total Env Interactions 9e6, Number Seeds per algorithm per experiment 5. |
| Hardware Specification | Yes | We run our experiments on Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz with 6 cores. |
| Software Dependencies | No | The paper mentions implementing approaches 'based off of the same code base [59]' (PPO Lagrangian Pytorch) and '[60]' (Omnisafe). However, it does not explicitly list specific version numbers for software dependencies such as Python, PyTorch, or other libraries used for the experiments, which are necessary for reproducible descriptions. |
| Experiment Setup | Yes | Table 2: Hyperparameter Settings Details |