Multi-Constraint Deep Reinforcement Learning for Smooth Action Control
Authors: Guangyuan Zou, Ying He, F. Richard Yu, Longquan Chen, Weike Pan, Zhong Ming
IJCAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive simulation results show that the proposed MCPPO method has better action smoothness compared with the traditional proportional-integral-differential (PID) and mainstream DRL algorithms. The video is available at https://youtu.be/F2jpa Sm7YOg. |
| Researcher Affiliation | Academia | Guangyuan Zou1,2 , Ying He1,2, , F. Richard Yu1,2 , Longquan Chen1,2 , Weike Pan1 and Zhong Ming1 1College of Computer Science and Software Engineering, Shenzhen University, P.R. China 2Guangdong Laboratory of Artiļ¬cial Intelligence and Digital Economy (SZ), Shenzhen, China 1900271046@email.szu.edu.cn, {heying, yufei}@szu.edu.cn, chenlongquan2019@email.szu.edu.cn, {panweike, mingz}@szu.edu.cn |
| Pseudocode | Yes | Finally, please refer to Appendix B for the details of the MCPPO pseudo-code. |
| Open Source Code | Yes | 1https://github.com/Gy Chou/mcppo Elegant RLfor Carla. |
| Open Datasets | No | The paper mentions designing environments in CARLA, an open urban driving simulator. However, it does not provide concrete access information (link, DOI, citation) to a specific dataset used for training that is publicly available. |
| Dataset Splits | No | The paper describes training and evaluation steps, but it does not specify explicit training, validation, or test dataset splits (e.g., percentages or sample counts) for reproducibility. |
| Hardware Specification | No | The paper does not provide any specific hardware details such as GPU models, CPU models, or memory specifications used for running the experiments. |
| Software Dependencies | No | The paper mentions 'CARLA' and 'Elegant RL' but does not provide specific version numbers for these or any other ancillary software components. |
| Experiment Setup | Yes | The decision interval t is 0.2s during training, but t is 0.05s and H expanded 4 times during evaluation, which is for comparison with PID controllers. The target speed of the PID controller is set equal to the desired speed of DRL algorithms. All DRLs hyperparameters are provided in Appendix C.1. |