reproducibilityindex.ai

Towards Safe Reinforcement Learning via Constraining Conditional Value-at-Risk

Authors: ChengYang Ying, Xinning Zhou, Hang Su, Dong Yan, Ning Chen, Jun Zhu

IJCAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results show that CPPO achieves a higher cumulative reward and is more robust against both observation and transition disturbances on a series of continuous control tasks in Mu Jo Co.
Researcher Affiliation	Collaboration	1Department of Computer Science & Technology, Institute for AI, BNRist Center, Tsinghua-Bosch Joint ML Center, THBI Lab, Tsinghua University; 2Peng Cheng Laboratory 3 Tsinghua University-China Mobile Communications Group Co., Ltd. Joint Institute
Pseudocode	Yes	Algorithm 1 CVa R Proximal Policy Optimization (CPPO)
Open Source Code	No	The implementation of all code, including CPPO and baselines, are based on the codebase Spinning Up. (This indicates they used an existing codebase, not that they released their own specific code for this paper.)
Open Datasets	Yes	We choose Mu Jo Co [Todorov et al., 2012] as our experimental environment. As a robotic locomotion simulator, Mu Jo Co has lots of different continuous control tasks like Ant, Half Cheetah, Walker2d, Swimmer and Hopper, which are widely used for the evaluation of RL algorithms.
Dataset Splits	No	The paper describes training and evaluating performance but does not specify explicit train/validation/test dataset splits (e.g., percentages or sample counts).
Hardware Specification	No	The paper does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments.
Software Dependencies	No	The paper mentions using 'Adam' for optimization and that code is 'based on the codebase Spinning Up', but it does not provide specific version numbers for these or any other software dependencies.
Experiment Setup	No	The 'Experiment Setup' section (5.1) describes the general environments, baselines, and evaluation strategies, but it does not provide concrete hyperparameter values (e.g., learning rate, batch size, number of epochs) or specific system-level training configurations in the main text.