reproducibilityindex.ai

Enhancing Efficiency of Safe Reinforcement Learning via Sample Manipulation

Authors: Shangding Gu, Laixi Shi, Yuhao Ding, Alois Knoll, Costas J Spanos, Adam Wierman, Ming Jin

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on the Safety-Mu Jo Co and Omnisafe benchmarks demonstrate that ESPO significantly outperforms existing primal-based and primaldual-based baselines in terms of reward maximization and constraint satisfaction. Moreover, ESPO achieves substantial gains in sample efficiency, requiring 25 29% fewer samples than baselines, and reduces training time by 21 38%.
Researcher Affiliation	Academia	Shangding Gu1,3 , Laixi Shi2 , Yuhao Ding1, Alois Knoll3, Costas Spanos1, Adam Wierman2, Ming Jin4 1University of California, Berkeley, USA 2California Institute of Technology, USA 3Technical University of Munich, Germany 4Virginia Tech, USA
Pseudocode	Yes	The details of this algorithm are summarized in Algorithm 1 in Appendix B.
Open Source Code	No	The paper does not provide an explicit link to its own open-source code nor a statement of its public release. It mentions using Omnisafe and Safety-Mu Jo Co benchmarks, but not the code for their proposed method.
Open Datasets	Yes	The Omnisafe3 [32] benchmark is leveraged for primal-dual based methods... Additionally, we use the Safety-Mu Jo Co4 [30] benchmark for primal-based methods.
Dataset Splits	No	The paper mentions overall sample steps and training epochs but does not specify explicit train/validation/test dataset splits or ratios.
Hardware Specification	Yes	Experiments in the tasks of Safety-Mu Jo Co benchmarks are conducted on a Ubuntu 20.04.3 LTS system, with an AMD Ryzen-7-2700X CPU and an NVIDIA Ge Force RTX 2060 GPU... Experiments on the tasks of Omnisafe benchmarks are conducted on a Ubuntu 20.04.6 LTS system, with 2 AMD EPYC-7763 CPUs and 6 NVIDIA RTX A6000 GPUs.
Software Dependencies	No	The paper mentions the operating system versions (Ubuntu 20.04.3 LTS and Ubuntu 20.04.6 LTS) but does not provide specific version numbers for other key software libraries or dependencies like Python, PyTorch, or CUDA.
Experiment Setup	Yes	The key parameters used in the tasks of Safety-Mu Jo Co benchmarks are provided in Table 4, Table 5 and Table 6. [...] The key parameters used on the tasks of Omnisafe benchmarks are provided in Table 5, Table 6, and Table 7.