Reduced Policy Optimization for Continuous Control with Hard Constraints
Authors: Shutong Ding, Jingya Wang, Yali Du, Ye Shi
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Comprehensive experiments on these benchmarks demonstrate the superiority of RPO in terms of both cumulative reward and constraint violation. We believe RPO, along with the new benchmarks, will open up new opportunities for applying RL to real-world problems with complex constraints. |
| Researcher Affiliation | Academia | Shutong Ding1 Jingya Wang1 Yali Du2 Ye Shi1 1Shanghai Tech University 2King s College London |
| Pseudocode | Yes | Algorithm 1 Training Procedure of RPO Algorithm 2 Generalized Reduced Gradient Algorithm Algorithm 3 RPO-DDPG Algorithm 4 RPO-SAC |
| Open Source Code | Yes | Our code is available at: https://github.com/wadx2019/rpo. |
| Open Datasets | Yes | Specifically, our benchmarks are designed based on [12], with extra interfaces to return the information of the hard constraints. The data on power demand and day-ahead electricity prices are from [1, 2]. |
| Dataset Splits | No | The paper does not provide specific dataset split information (exact percentages, sample counts, citations to predefined splits, or detailed splitting methodology) for training, validation, and test sets, beyond the general concept of training within an RL environment. |
| Hardware Specification | Yes | We implemented our experiments on a GPU of NVIDIA Ge Force RTX 3090 with 24GB. |
| Software Dependencies | Yes | The implementation of three safe RL algorithms in our experiments are based on omnisafe 2 and safe-explorer 3, and recommended values are adopted for hyper-parameters not mentioned in the following tables. |
| Experiment Setup | Yes | Parameter tables (Table 4, Table 5, and Table 6) list various hyperparameters such as Batch Size, Discount Factor, Learning Rates for Policy and Value Networks, Temperature, and Max GRG Updates for each experiment. |