reproducibilityindex.ai

Augmented Proximal Policy Optimization for Safe Reinforcement Learning

Authors: Juntao Dai, Jiaming Ji, Long Yang, Qian Zheng, Gang Pan

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We apply our APPO methods in diverse safety-constrained tasks, setting a new state of the art compared with a comprehensive list of safe RL baselines. Extensive experiments verify the merits of our method in easy implementation, stable convergence, and precise cost control.
Researcher Affiliation	Academia	1 The State Key Lab of Brain-Machine Intelligence, Zhejiang University, Hangzhou, China 2College of Computer Science and Technology, Zhejiang University, Hangzhou, China 3School of Artificial Intelligence, Peking University, Beijing, China
Pseudocode	Yes	We present the pseudo-code of APPO in Algorithm 1.
Open Source Code	No	The paper does not provide an explicit statement or link for the open-source code of the described methodology.
Open Datasets	Yes	For a comprehensive evaluation, we select four representative tasks from three well-known safe RL benchmark environments (Safe Mu Jo Co (Zhang, Vuong, and Ross 2020), Safety Gym (Ray, Achiam, and Amodei 2019), and Bullet Safety Gym (Gronauer 2022)) as our experimental scenarios.
Dataset Splits	No	The paper does not provide specific details about train/validation/test dataset splits, such as percentages or sample counts.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., CPU/GPU models, memory) used for running its experiments.
Software Dependencies	No	The paper does not provide specific software dependencies with version numbers (e.g., Python, PyTorch versions) needed for replication.
Experiment Setup	No	The paper describes general aspects of the training process and adaptive hyperparameter adjustment (e.g., for penalty factor and multiplier learning rate) but it does not provide specific numerical values for common hyperparameters like learning rate, batch size, or number of epochs in the main text.