Safe Reinforcement Learning via Probabilistic Logic Shields

Authors: Wen-Chi Yang, Giuseppe Marra, Gavin Rens, Luc De Raedt

IJCAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct an empirical evaluation of PLPG, comparing it to other baselines in terms of return and safety in a reinforcement learning setting.
Researcher Affiliation Academia 1Leuven AI, KU Leuven, Belgium 2Stellenbosch University, South Africa 3Centre for Applied Autonomous Sensor Systems, Orebro University, Sweden
Pseudocode No The paper does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code Yes The source code can be found on https://github.com/wenchiyang/pls.
Open Datasets Yes All environments in this work are available on Git Hub under the MIT license. The source code and the data used to generate Figs. 5 to 7 can be found on https://github.com/ wenciyang/pls.
Dataset Splits No The paper describes generating random images for pre-training noisy sensors and mentions evaluating on a 'validation set' for that purpose, but does not specify train/validation/test splits for the main reinforcement learning experiments.
Hardware Specification Yes Experiments are run on machines that consist of Intel(R) Xeon(R) E3-1225 CPU cores and 32Gb memory.
Software Dependencies No All agents are trained using PPO in stable-baselines [Hill et al., 2018] with batch size=512, n epochs=15, n steps=2048, clip range=0.1, learning rate=0.0001. All policy networks and value networks have two hidden layers of size 64. All the other hyperparameters are set to default as in stable-baselines [Hill et al., 2018].
Experiment Setup Yes All agents are trained using PPO in stable-baselines [Hill et al., 2018] with batch size=512, n epochs=15, n steps=2048, clip range=0.1, learning rate=0.0001. All policy networks and value networks have two hidden layers of size 64.