reproducibilityindex.ai

Safety through feedback in Constrained RL

Authors: Shashank Reddy Chirra, Pradeep Varakantham, Praveen Paruchuri

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We showcase the efficiency of our method through experimentation on several benchmark Safety Gymnasium environments and realistic self-driving scenarios. Our method demonstrates nearoptimal performance, comparable to when the cost function is known, by relying solely on trajectory-level feedback across multiple domains. This highlights both the effectiveness and scalability of our approach. The code to replicate these results can be found at https://github.com/shshnkreddy/RLSF
Researcher Affiliation	Academia	Shashank Reddy Chirra 1, Pradeep Varakantham1, Praveen Paruchuri2 1Singapore Management University, 2IIIT Hyderabad {shashankc,pradeepv}@smu.edu.sg, praveen.p@iiit.ac.in
Pseudocode	Yes	Algorithm 1 Reinforcement Learning from Safety Feedback (RLSF)
Open Source Code	Yes	The code to replicate these results can be found at https://github.com/shshnkreddy/RLSF
Open Datasets	Yes	We evaluate RLSF on multiple continuous control benchmarks in the Safety Gymnasium environment [17] and Mujoco [30] based environments introduced in [22].
Dataset Splits	No	The paper discusses 'train', 'validation', and 'test' in the context of policy and model evaluation, but does not provide specific data splits (e.g., percentages or counts) for any dataset used to allow reproduction of the data partitioning.
Hardware Specification	Yes	We conducted the experiments on a cluster quipped with 4 NVIDIA RTX A5000 GPUs and 96 core CPUs.
Software Dependencies	No	The paper mentions software components like 'PPO-Lagrangian algorithm [3]' and 'Sim Hash [10]' but does not provide specific version numbers for any software or libraries used in the implementation, which is necessary for reproducible software dependencies.
Experiment Setup	Yes	The detailed hyperparameters utilized in the experiments are presented in Table 3. All results presented in both the main paper and the appendix are based on three independent seeds, with the mean and standard error reported, unless explicitly stated otherwise.