reproducibilityindex.ai

Penalized Proximal Policy Optimization for Safe Reinforcement Learning

Authors: Linrui Zhang, Li Shen, Long Yang, Shixiang Chen, Xueqian Wang, Bo Yuan, Dacheng Tao

IJCAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments show that P3O outperforms state-ofthe-art algorithms with respect to both reward improvement and constraint satisfaction on a set of constrained locomotive tasks.
Researcher Affiliation	Collaboration	1Tsinghua University 2JD Explore Academy 3Peking University
Pseudocode	Yes	Algorithm 1 P3O: Penalized Proximal Policy Optimization; Algorithm 2 Exact Penalized Policy Search Algorithm
Open Source Code	No	The paper only links to a third-party code-base (https://github.com/openai/safety-starter-agents) used for benchmarking, not the specific source code for the proposed P3O algorithm or its implementation.
Open Datasets	Yes	We design and conduct experiments in 2 single-constraint (Circle and Gather), 1 multi-constraint (Navigation) and 1 multi-agent (Simple Spread) safe RL environments respectively, as illustrated in Figure 1. ... The proposed P3O algorithm and FOCOPS [Zhang et al., 2020] are implemented with same rules and tricks on the code-base of Ray et al.[2019]1 for benchmarking safe RL algorithms. 1https://github.com/openai/safety-starter-agents
Dataset Splits	No	The paper does not provide specific dataset split information (exact percentages, sample counts, citations to predefined splits, or detailed splitting methodology) needed to reproduce the data partitioning for its experiments, as RL environments generate data dynamically.
Hardware Specification	No	The paper does not provide specific hardware details (exact GPU/CPU models, processor types, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies	No	The paper mentions using a 'code-base of Ray et al.[2019]' but does not provide specific ancillary software details, such as library names with version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup	No	The paper states that 'More information about experiment environments and detailed parameters are provided in the supplementary material,' indicating that specific experimental setup details are not present in the main text.