Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Safe Reinforcement Learning via Probabilistic Logic Shields
Authors: Wen-Chi Yang, Giuseppe Marra, Gavin Rens, Luc De Raedt
IJCAI 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct an empirical evaluation of PLPG, comparing it to other baselines in terms of return and safety in a reinforcement learning setting. |
| Researcher Affiliation | Academia | 1Leuven AI, KU Leuven, Belgium 2Stellenbosch University, South Africa 3Centre for Applied Autonomous Sensor Systems, Orebro University, Sweden |
| Pseudocode | No | The paper does not contain any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | The source code can be found on https://github.com/wenchiyang/pls. |
| Open Datasets | Yes | All environments in this work are available on Git Hub under the MIT license. The source code and the data used to generate Figs. 5 to 7 can be found on https://github.com/ wenciyang/pls. |
| Dataset Splits | No | The paper describes generating random images for pre-training noisy sensors and mentions evaluating on a 'validation set' for that purpose, but does not specify train/validation/test splits for the main reinforcement learning experiments. |
| Hardware Specification | Yes | Experiments are run on machines that consist of Intel(R) Xeon(R) E3-1225 CPU cores and 32Gb memory. |
| Software Dependencies | No | All agents are trained using PPO in stable-baselines [Hill et al., 2018] with batch size=512, n epochs=15, n steps=2048, clip range=0.1, learning rate=0.0001. All policy networks and value networks have two hidden layers of size 64. All the other hyperparameters are set to default as in stable-baselines [Hill et al., 2018]. |
| Experiment Setup | Yes | All agents are trained using PPO in stable-baselines [Hill et al., 2018] with batch size=512, n epochs=15, n steps=2048, clip range=0.1, learning rate=0.0001. All policy networks and value networks have two hidden layers of size 64. |