Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Explainably Safe Reinforcement Learning

Authors: Sabine Rieder, Stefan Pranger, Debraj Chakraborty, Jan Kretinsky, Bettina Könighofer

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In our experiments, we compute explanations using decision trees that are several orders of magnitude smaller than the original shield. We evaluate our implementation on several challenging RL benchmarks, showing that the resulting explainable shields are compact and comprehensible, in contrast to traditional shields, which typically involve thousands of states.
Researcher Affiliation Academia Sabine Rieder Masaryk University Technical University of Munich EMAIL, Stefan Pranger Graz University of Technology EMAIL Debraj Chakraborty Masaryk University EMAIL Jan Kˇret ınsk y Masaryk University Technical University of Munich EMAIL, Bettina K onighofer Graz University of Technology EMAIL
Pseudocode No The paper describes the computational steps for shield and DT construction in prose, for example, under 'Shield Computation' and 'Computation of DT TL1', but does not present them in a formal pseudocode block or algorithm environment.
Open Source Code Yes We provide the implementation as supplementary material.
Open Datasets Yes We performed a first set of experiments using the Farama Frozen Lake environment [45]. We conducted our second set of experiments in the Farama Highway environment [28].
Dataset Splits No For the experiments on the scalability of our approach, we randomly generated instances of the Frozen Lake environment of various sizes. Per size, we generate 10 random instances and compare the sizes of the computed shields and tree representations. The paper describes generating environments and running multiple simulations but does not specify train/test/validation splits for a fixed dataset, which is typical for RL environments where agents interact directly.
Hardware Specification Yes All experiments were conducted on a laptop with an Intel Core i7-11800H CPU at 2.3 GHz with 32 GB of RAM.
Software Dependencies No The model checking queries were computed using TEMPEST [34], and the DT representations of shields using DTCONTROL [5]. We have trained agents on the Frozen Lake environment ... using the implementations from Stable-Baselines3 [37]. The paper mentions software tools but does not provide specific version numbers for them.
Experiment Setup Yes The shield preventing the agent from falling into a hole is computed with a horizon h = and a risk threshold ϵ = 0.075. We compute a shield that ensures collision avoidance by enforcing a safe distance of 20m. In this example, the shield prohibits the agent from taking any risks (i.e., ϵ = 0) with h = . All training runs have been conducted using the default parameters.