reproducibilityindex.ai

Multi-Constraint Deep Reinforcement Learning for Smooth Action Control

Authors: Guangyuan Zou, Ying He, F. Richard Yu, Longquan Chen, Weike Pan, Zhong Ming

IJCAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive simulation results show that the proposed MCPPO method has better action smoothness compared with the traditional proportional-integral-differential (PID) and mainstream DRL algorithms. The video is available at https://youtu.be/F2jpa Sm7YOg.
Researcher Affiliation	Academia	Guangyuan Zou1,2 , Ying He1,2, , F. Richard Yu1,2 , Longquan Chen1,2 , Weike Pan1 and Zhong Ming1 1College of Computer Science and Software Engineering, Shenzhen University, P.R. China 2Guangdong Laboratory of Artiﬁcial Intelligence and Digital Economy (SZ), Shenzhen, China 1900271046@email.szu.edu.cn, {heying, yufei}@szu.edu.cn, chenlongquan2019@email.szu.edu.cn, {panweike, mingz}@szu.edu.cn
Pseudocode	Yes	Finally, please refer to Appendix B for the details of the MCPPO pseudo-code.
Open Source Code	Yes	1https://github.com/Gy Chou/mcppo Elegant RLfor Carla.
Open Datasets	No	The paper mentions designing environments in CARLA, an open urban driving simulator. However, it does not provide concrete access information (link, DOI, citation) to a specific dataset used for training that is publicly available.
Dataset Splits	No	The paper describes training and evaluation steps, but it does not specify explicit training, validation, or test dataset splits (e.g., percentages or sample counts) for reproducibility.
Hardware Specification	No	The paper does not provide any specific hardware details such as GPU models, CPU models, or memory specifications used for running the experiments.
Software Dependencies	No	The paper mentions 'CARLA' and 'Elegant RL' but does not provide specific version numbers for these or any other ancillary software components.
Experiment Setup	Yes	The decision interval t is 0.2s during training, but t is 0.05s and H expanded 4 times during evaluation, which is for comparison with PID controllers. The target speed of the PID controller is set equal to the desired speed of DRL algorithms. All DRLs hyperparameters are provided in Appendix C.1.