reproducibilityindex.ai

Reduced Policy Optimization for Continuous Control with Hard Constraints

Authors: Shutong Ding, Jingya Wang, Yali Du, Ye Shi

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Comprehensive experiments on these benchmarks demonstrate the superiority of RPO in terms of both cumulative reward and constraint violation. We believe RPO, along with the new benchmarks, will open up new opportunities for applying RL to real-world problems with complex constraints.
Researcher Affiliation	Academia	Shutong Ding1 Jingya Wang1 Yali Du2 Ye Shi1 1Shanghai Tech University 2King s College London
Pseudocode	Yes	Algorithm 1 Training Procedure of RPO Algorithm 2 Generalized Reduced Gradient Algorithm Algorithm 3 RPO-DDPG Algorithm 4 RPO-SAC
Open Source Code	Yes	Our code is available at: https://github.com/wadx2019/rpo.
Open Datasets	Yes	Specifically, our benchmarks are designed based on [12], with extra interfaces to return the information of the hard constraints. The data on power demand and day-ahead electricity prices are from [1, 2].
Dataset Splits	No	The paper does not provide specific dataset split information (exact percentages, sample counts, citations to predefined splits, or detailed splitting methodology) for training, validation, and test sets, beyond the general concept of training within an RL environment.
Hardware Specification	Yes	We implemented our experiments on a GPU of NVIDIA Ge Force RTX 3090 with 24GB.
Software Dependencies	Yes	The implementation of three safe RL algorithms in our experiments are based on omnisafe 2 and safe-explorer 3, and recommended values are adopted for hyper-parameters not mentioned in the following tables.
Experiment Setup	Yes	Parameter tables (Table 4, Table 5, and Table 6) list various hyperparameters such as Batch Size, Discount Factor, Learning Rates for Policy and Value Networks, Temperature, and Max GRG Updates for each experiment.