Multi-Constraint Deep Reinforcement Learning for Smooth Action Control

Authors: Guangyuan Zou, Ying He, F. Richard Yu, Longquan Chen, Weike Pan, Zhong Ming

IJCAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive simulation results show that the proposed MCPPO method has better action smoothness compared with the traditional proportional-integral-differential (PID) and mainstream DRL algorithms. The video is available at https://youtu.be/F2jpa Sm7YOg.
Researcher Affiliation Academia Guangyuan Zou1,2 , Ying He1,2, , F. Richard Yu1,2 , Longquan Chen1,2 , Weike Pan1 and Zhong Ming1 1College of Computer Science and Software Engineering, Shenzhen University, P.R. China 2Guangdong Laboratory of Artificial Intelligence and Digital Economy (SZ), Shenzhen, China 1900271046@email.szu.edu.cn, {heying, yufei}@szu.edu.cn, chenlongquan2019@email.szu.edu.cn, {panweike, mingz}@szu.edu.cn
Pseudocode Yes Finally, please refer to Appendix B for the details of the MCPPO pseudo-code.
Open Source Code Yes 1https://github.com/Gy Chou/mcppo Elegant RLfor Carla.
Open Datasets No The paper mentions designing environments in CARLA, an open urban driving simulator. However, it does not provide concrete access information (link, DOI, citation) to a specific dataset used for training that is publicly available.
Dataset Splits No The paper describes training and evaluation steps, but it does not specify explicit training, validation, or test dataset splits (e.g., percentages or sample counts) for reproducibility.
Hardware Specification No The paper does not provide any specific hardware details such as GPU models, CPU models, or memory specifications used for running the experiments.
Software Dependencies No The paper mentions 'CARLA' and 'Elegant RL' but does not provide specific version numbers for these or any other ancillary software components.
Experiment Setup Yes The decision interval t is 0.2s during training, but t is 0.05s and H expanded 4 times during evaluation, which is for comparison with PID controllers. The target speed of the PID controller is set equal to the desired speed of DRL algorithms. All DRLs hyperparameters are provided in Appendix C.1.