Penalized Proximal Policy Optimization for Safe Reinforcement Learning

Authors: Linrui Zhang, Li Shen, Long Yang, Shixiang Chen, Xueqian Wang, Bo Yuan, Dacheng Tao

IJCAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments show that P3O outperforms state-ofthe-art algorithms with respect to both reward improvement and constraint satisfaction on a set of constrained locomotive tasks.
Researcher Affiliation Collaboration 1Tsinghua University 2JD Explore Academy 3Peking University
Pseudocode Yes Algorithm 1 P3O: Penalized Proximal Policy Optimization; Algorithm 2 Exact Penalized Policy Search Algorithm
Open Source Code No The paper only links to a third-party code-base (https://github.com/openai/safety-starter-agents) used for benchmarking, not the specific source code for the proposed P3O algorithm or its implementation.
Open Datasets Yes We design and conduct experiments in 2 single-constraint (Circle and Gather), 1 multi-constraint (Navigation) and 1 multi-agent (Simple Spread) safe RL environments respectively, as illustrated in Figure 1. ... The proposed P3O algorithm and FOCOPS [Zhang et al., 2020] are implemented with same rules and tricks on the code-base of Ray et al.[2019]1 for benchmarking safe RL algorithms. 1https://github.com/openai/safety-starter-agents
Dataset Splits No The paper does not provide specific dataset split information (exact percentages, sample counts, citations to predefined splits, or detailed splitting methodology) needed to reproduce the data partitioning for its experiments, as RL environments generate data dynamically.
Hardware Specification No The paper does not provide specific hardware details (exact GPU/CPU models, processor types, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies No The paper mentions using a 'code-base of Ray et al.[2019]' but does not provide specific ancillary software details, such as library names with version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup No The paper states that 'More information about experiment environments and detailed parameters are provided in the supplementary material,' indicating that specific experimental setup details are not present in the main text.