reproducibilityindex.ai

Constrained Update Projection Approach to Safe Policy Optimization

Authors: Long Yang, Jiaming Ji, Juntao Dai, Linrui Zhang, Binbin Zhou, Pengfei Li, Yaodong Yang, Gang Pan

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To validate our CUP method, we compared CUP against a comprehensive list of safe RL baselines on a wide range of tasks. Experiments show the effectiveness of CUP both in terms of reward and safety constraint satisfaction.
Researcher Affiliation	Academia	1College of Computer Science and Technology, Zhejiang University, China 2 School of Artiﬁcial Intelligence, Peking University, China 3 Tsinghua Shenzhen International Graduate School, Tsinghua University, China 4 Department of Computer Science and Computing, Zhejiang University City College, China 5 Institute for Artiﬁcial Intelligence, Peking University & BIGAI, China
Pseudocode	Yes	Due to the limitation of space, we present all the details of the implementation in Appendix C and Algorithm 1.
Open Source Code	Yes	We have opened the source code of CUP at https://github.com/zmsn-2077/CUP-safe-rl.
Open Datasets	Yes	We train different robotic agents using ﬁve Mu Jo Co physical simulators [Todorov et al., 2012] which are open by Open AI Gym API [Brockman et al., 2016], and Safety Gym [Ray et al., 2019].
Dataset Splits	No	The paper mentions training details but does not explicitly provide information on how data was split into training, validation, and test sets. Reinforcement learning typically involves continuous interaction with an environment rather than predefined dataset splits for training and evaluation in the supervised learning sense.
Hardware Specification	Yes	All experiments are conducted on NVIDIA RTX 3090 GPUs.
Software Dependencies	No	Our implementations are based on PyTorch, OpenAI Gym, and Safety Gym. While software names are provided, specific version numbers are not listed.
Experiment Setup	Yes	For more details, see Appendix H.2. H.1 Hyperparameters: The paper includes a detailed section (Appendix H.1) listing specific hyperparameters such as 'Learning Rate', 'Discount factor (gamma)', 'GAE lambda', 'Clip parameter', 'Value function coefficient', 'Entropy coefficient', 'Epochs per update', 'Mini batch size', 'Number of iterations', and others with their numerical values.