Enhancing Efficiency of Safe Reinforcement Learning via Sample Manipulation

Authors: Shangding Gu, Laixi Shi, Yuhao Ding, Alois Knoll, Costas J Spanos, Adam Wierman, Ming Jin

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on the Safety-Mu Jo Co and Omnisafe benchmarks demonstrate that ESPO significantly outperforms existing primal-based and primaldual-based baselines in terms of reward maximization and constraint satisfaction. Moreover, ESPO achieves substantial gains in sample efficiency, requiring 25 29% fewer samples than baselines, and reduces training time by 21 38%.
Researcher Affiliation Academia Shangding Gu1,3 , Laixi Shi2 , Yuhao Ding1, Alois Knoll3, Costas Spanos1, Adam Wierman2, Ming Jin4 1University of California, Berkeley, USA 2California Institute of Technology, USA 3Technical University of Munich, Germany 4Virginia Tech, USA
Pseudocode Yes The details of this algorithm are summarized in Algorithm 1 in Appendix B.
Open Source Code No The paper does not provide an explicit link to its own open-source code nor a statement of its public release. It mentions using Omnisafe and Safety-Mu Jo Co benchmarks, but not the code for their proposed method.
Open Datasets Yes The Omnisafe3 [32] benchmark is leveraged for primal-dual based methods... Additionally, we use the Safety-Mu Jo Co4 [30] benchmark for primal-based methods.
Dataset Splits No The paper mentions overall sample steps and training epochs but does not specify explicit train/validation/test dataset splits or ratios.
Hardware Specification Yes Experiments in the tasks of Safety-Mu Jo Co benchmarks are conducted on a Ubuntu 20.04.3 LTS system, with an AMD Ryzen-7-2700X CPU and an NVIDIA Ge Force RTX 2060 GPU... Experiments on the tasks of Omnisafe benchmarks are conducted on a Ubuntu 20.04.6 LTS system, with 2 AMD EPYC-7763 CPUs and 6 NVIDIA RTX A6000 GPUs.
Software Dependencies No The paper mentions the operating system versions (Ubuntu 20.04.3 LTS and Ubuntu 20.04.6 LTS) but does not provide specific version numbers for other key software libraries or dependencies like Python, PyTorch, or CUDA.
Experiment Setup Yes The key parameters used in the tasks of Safety-Mu Jo Co benchmarks are provided in Table 4, Table 5 and Table 6. [...] The key parameters used on the tasks of Omnisafe benchmarks are provided in Table 5, Table 6, and Table 7.