Exploring Safer Behaviors for Deep Reinforcement Learning

Authors: Enrico Marchesini, Davide Corsi, Alessandro Farinelli7701-7709

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical evidence on the Safety Gym benchmark shows that we successfully avoid drawbacks on the return while improving the safety of the policy. We compare a SOS implementation of PPO (Dhariwal et al. 2017) and TD3 (Fujimoto, van Hoof, and Meger 2018) over constrained approaches, namely CPO (Achiam et al. 2017), Lagrangian-PPO (Stooke, Achiam, and Abbeel 2020), and IPO (Liu, Ding, and Liu 2020), in the recent Safety Gym benchmarks (Ray, Achiam, and Amodei 2019).
Researcher Affiliation Academia Enrico Marchesini*, Davide Corsi*, Alessandro Farinelli Department of Computer Science, University of Verona enrico.marchesini@univr.it, davide.corsi@univr.it
Pseudocode Yes Algorithm 1: Safety-Oriented Search
Open Source Code No The information is insufficient. The paper does not provide an explicit statement of open-sourcing their code or a link to a repository for the methodology described.
Open Datasets Yes in the recent Safety Gym benchmarks (Ray, Achiam, and Amodei 2019). We consider six tasks recommended by the authors of Safety Gym as a benchmark for our class of problems.
Dataset Splits No The information is insufficient. The paper does not provide specific dataset split information (exact percentages, sample counts, or citations to predefined splits) needed to reproduce data partitioning into training, validation, and test sets.
Hardware Specification Yes Data are collected on a RTX 2080, using the hyperparameters reported in the supplemental material.
Software Dependencies No The information is insufficient. The paper mentions various algorithms and a verification tool (Neurify) but does not provide specific version numbers for any general software dependencies or libraries.
Experiment Setup No The information is insufficient. The paper states that "Data are collected on a RTX 2080, using the hyperparameters reported in the supplemental material.", indicating that specific experimental setup details like hyperparameters are not included in the main text.