reproducibilityindex.ai

Enhancing Safe Exploration Using Safety State Augmentation

Authors: Aivar Sootla, Alexander Cowen-Rivers, Jun Wang, Haitham Bou Ammar

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We apply this idea to two safe RL problems: RL with constraints imposed on an average cost, and RL with constraints imposed on a cost with probability one. Our experiments suggest that simmering a safe algorithm can improve safety during training for both settings. We further show that Simmer can stabilize training and improve the performance of safe RL with average constraints.
Researcher Affiliation	Collaboration	Aivar Sootla Byju s Lab aivar.sootla@gmail.com Alexander I. Cowen-Rivers Technische Universit at Darmstadt mc rivers@icloud.com Jun Wang University College London jun.wang@cs.ucl.ac.uk Haitham Bou Ammar Huawei R&D haitham.ammar@huawei.com
Pseudocode	Yes	Algorithm 1: PI SIMMER (basic version) Algorithm 2: Q-SIMMER
Open Source Code	Yes	The code for PI Simmer and Q Simmer is available at https://github.com/huawei-noah/HEBO/ tree/master/SIMMER.
Open Datasets	Yes	Environments: We use the safe pendulum environment defined in [16], and we also use the custom-made safety gym environment with deterministic constraints, which we call static point goal [51]. ... The rest of our tests are performed on the safety gym benchmarks [37].
Dataset Splits	No	The paper does not explicitly provide information about train/validation/test splits, only mentioning datasets used for testing/evaluation. For example, it states "Mean returns and cost are computed over a hundred different trajectories obtained for three different seeds.", but not data splits.
Hardware Specification	Yes	Computational resources: We performed all computations on a PC equipped with 512GB of RAM, two Intel Xeon E5 CPUs, and four 16GB NVIDIA Tesla V100 GPUs.
Software Dependencies	No	The paper mentions that the code is based on "safety starter agents [37], and PID Lagrangian [44]" and that it uses "default parameters for both code bases unless stated otherwise." It also mentions "Python" indirectly in the ethics statement. However, it does not provide specific version numbers for Python, any libraries, or specific software dependencies needed for reproducibility.
Experiment Setup	Yes	For PI Simmer we chose the following hyper-parameters K = 0.01, Ki = 0.005, Kaw = 0.01 and τ = 0.995. ... For Q Simmer we chose δ = 1, τ = 0.995, lr = 0.05, and ε = 0.95. ... We have used the same hyper-parameters for all algorithms, which are default parameters in safety starter agents and the learning rate 0.03. ... In all our experiments we used the same hyper-parameters for all versions of PID-L, i.e., K = 0.1, Ki = 0.01, γl = 0.99.