Exploring Safer Behaviors for Deep Reinforcement Learning
Authors: Enrico Marchesini, Davide Corsi, Alessandro Farinelli7701-7709
AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical evidence on the Safety Gym benchmark shows that we successfully avoid drawbacks on the return while improving the safety of the policy. We compare a SOS implementation of PPO (Dhariwal et al. 2017) and TD3 (Fujimoto, van Hoof, and Meger 2018) over constrained approaches, namely CPO (Achiam et al. 2017), Lagrangian-PPO (Stooke, Achiam, and Abbeel 2020), and IPO (Liu, Ding, and Liu 2020), in the recent Safety Gym benchmarks (Ray, Achiam, and Amodei 2019). |
| Researcher Affiliation | Academia | Enrico Marchesini*, Davide Corsi*, Alessandro Farinelli Department of Computer Science, University of Verona enrico.marchesini@univr.it, davide.corsi@univr.it |
| Pseudocode | Yes | Algorithm 1: Safety-Oriented Search |
| Open Source Code | No | The information is insufficient. The paper does not provide an explicit statement of open-sourcing their code or a link to a repository for the methodology described. |
| Open Datasets | Yes | in the recent Safety Gym benchmarks (Ray, Achiam, and Amodei 2019). We consider six tasks recommended by the authors of Safety Gym as a benchmark for our class of problems. |
| Dataset Splits | No | The information is insufficient. The paper does not provide specific dataset split information (exact percentages, sample counts, or citations to predefined splits) needed to reproduce data partitioning into training, validation, and test sets. |
| Hardware Specification | Yes | Data are collected on a RTX 2080, using the hyperparameters reported in the supplemental material. |
| Software Dependencies | No | The information is insufficient. The paper mentions various algorithms and a verification tool (Neurify) but does not provide specific version numbers for any general software dependencies or libraries. |
| Experiment Setup | No | The information is insufficient. The paper states that "Data are collected on a RTX 2080, using the hyperparameters reported in the supplemental material.", indicating that specific experimental setup details like hyperparameters are not included in the main text. |