Safe Reinforcement Learning via Probabilistic Logic Shields
Authors: Wen-Chi Yang, Giuseppe Marra, Gavin Rens, Luc De Raedt
IJCAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct an empirical evaluation of PLPG, comparing it to other baselines in terms of return and safety in a reinforcement learning setting. |
| Researcher Affiliation | Academia | 1Leuven AI, KU Leuven, Belgium 2Stellenbosch University, South Africa 3Centre for Applied Autonomous Sensor Systems, Orebro University, Sweden |
| Pseudocode | No | The paper does not contain any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | The source code can be found on https://github.com/wenchiyang/pls. |
| Open Datasets | Yes | All environments in this work are available on Git Hub under the MIT license. The source code and the data used to generate Figs. 5 to 7 can be found on https://github.com/ wenciyang/pls. |
| Dataset Splits | No | The paper describes generating random images for pre-training noisy sensors and mentions evaluating on a 'validation set' for that purpose, but does not specify train/validation/test splits for the main reinforcement learning experiments. |
| Hardware Specification | Yes | Experiments are run on machines that consist of Intel(R) Xeon(R) E3-1225 CPU cores and 32Gb memory. |
| Software Dependencies | No | All agents are trained using PPO in stable-baselines [Hill et al., 2018] with batch size=512, n epochs=15, n steps=2048, clip range=0.1, learning rate=0.0001. All policy networks and value networks have two hidden layers of size 64. All the other hyperparameters are set to default as in stable-baselines [Hill et al., 2018]. |
| Experiment Setup | Yes | All agents are trained using PPO in stable-baselines [Hill et al., 2018] with batch size=512, n epochs=15, n steps=2048, clip range=0.1, learning rate=0.0001. All policy networks and value networks have two hidden layers of size 64. |