Spectral-Risk Safe Reinforcement Learning with Convergence Guarantees

Authors: Dohyeong Kim, Taehyun Cho, Seungyub Han, Hojun Chung, Kyungjae Lee, Songhwai Oh

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Furthermore, the proposed method has been evaluated on continuous control tasks and showed the best performance among other RCRL algorithms satisfying the constraints. The experiments are conducted on the Safety Gymnasium tasks [14] with a single constraint and the legged robot locomotion tasks [15] with multiple constraints.
Researcher Affiliation Academia 1Dep. of Electrical and Computer Engineering, Seoul National University 2Dep. of Statistics, Korea University
Pseudocode Yes An overview of SRCPO is presented in Algorithm 1, and a detailed pseudo-code of the proposed method is described in Algorithm 2.
Open Source Code Yes Our code is available at https://github.com/rllab-snu/Spectral-Risk-Constrained-RL.
Open Datasets Yes The experiments are conducted on the Safety Gymnasium tasks [14] with a single constraint and the legged robot locomotion tasks [15] with multiple constraints.
Dataset Splits No The paper describes the tasks and data collection, but does not explicitly provide specific training, validation, and test dataset splits with percentages or counts.
Hardware Specification Yes All experiments were conducted on a PC whose CPU and GPU are an Intel Xeon E5-2680 and NVIDIA TITAN Xp, respectively.
Software Dependencies No The paper mentions software components like 'quantile distributional critics' and 'truncated normal distribution' but does not provide specific version numbers for libraries or frameworks used (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes The hyperparameters and network structure of each algorithm are detailed in Appendix D. Table 1: Details of network structures. Table 2: Description on hyperparameter settings.