Constrained Update Projection Approach to Safe Policy Optimization
Authors: Long Yang, Jiaming Ji, Juntao Dai, Linrui Zhang, Binbin Zhou, Pengfei Li, Yaodong Yang, Gang Pan
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To validate our CUP method, we compared CUP against a comprehensive list of safe RL baselines on a wide range of tasks. Experiments show the effectiveness of CUP both in terms of reward and safety constraint satisfaction. |
| Researcher Affiliation | Academia | 1College of Computer Science and Technology, Zhejiang University, China 2 School of Artificial Intelligence, Peking University, China 3 Tsinghua Shenzhen International Graduate School, Tsinghua University, China 4 Department of Computer Science and Computing, Zhejiang University City College, China 5 Institute for Artificial Intelligence, Peking University & BIGAI, China |
| Pseudocode | Yes | Due to the limitation of space, we present all the details of the implementation in Appendix C and Algorithm 1. |
| Open Source Code | Yes | We have opened the source code of CUP at https://github.com/zmsn-2077/CUP-safe-rl. |
| Open Datasets | Yes | We train different robotic agents using five Mu Jo Co physical simulators [Todorov et al., 2012] which are open by Open AI Gym API [Brockman et al., 2016], and Safety Gym [Ray et al., 2019]. |
| Dataset Splits | No | The paper mentions training details but does not explicitly provide information on how data was split into training, validation, and test sets. Reinforcement learning typically involves continuous interaction with an environment rather than predefined dataset splits for training and evaluation in the supervised learning sense. |
| Hardware Specification | Yes | All experiments are conducted on NVIDIA RTX 3090 GPUs. |
| Software Dependencies | No | Our implementations are based on PyTorch, OpenAI Gym, and Safety Gym. While software names are provided, specific version numbers are not listed. |
| Experiment Setup | Yes | For more details, see Appendix H.2. H.1 Hyperparameters: The paper includes a detailed section (Appendix H.1) listing specific hyperparameters such as 'Learning Rate', 'Discount factor (gamma)', 'GAE lambda', 'Clip parameter', 'Value function coefficient', 'Entropy coefficient', 'Epochs per update', 'Mini batch size', 'Number of iterations', and others with their numerical values. |