reproducibilityindex.ai

Safe Reinforcement Learning via Probabilistic Logic Shields

Authors: Wen-Chi Yang, Giuseppe Marra, Gavin Rens, Luc De Raedt

IJCAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct an empirical evaluation of PLPG, comparing it to other baselines in terms of return and safety in a reinforcement learning setting.
Researcher Affiliation	Academia	1Leuven AI, KU Leuven, Belgium 2Stellenbosch University, South Africa 3Centre for Applied Autonomous Sensor Systems, Orebro University, Sweden
Pseudocode	No	The paper does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	The source code can be found on https://github.com/wenchiyang/pls.
Open Datasets	Yes	All environments in this work are available on Git Hub under the MIT license. The source code and the data used to generate Figs. 5 to 7 can be found on https://github.com/ wenciyang/pls.
Dataset Splits	No	The paper describes generating random images for pre-training noisy sensors and mentions evaluating on a 'validation set' for that purpose, but does not specify train/validation/test splits for the main reinforcement learning experiments.
Hardware Specification	Yes	Experiments are run on machines that consist of Intel(R) Xeon(R) E3-1225 CPU cores and 32Gb memory.
Software Dependencies	No	All agents are trained using PPO in stable-baselines [Hill et al., 2018] with batch size=512, n epochs=15, n steps=2048, clip range=0.1, learning rate=0.0001. All policy networks and value networks have two hidden layers of size 64. All the other hyperparameters are set to default as in stable-baselines [Hill et al., 2018].
Experiment Setup	Yes	All agents are trained using PPO in stable-baselines [Hill et al., 2018] with batch size=512, n epochs=15, n steps=2048, clip range=0.1, learning rate=0.0001. All policy networks and value networks have two hidden layers of size 64.