Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

ASP-Driven Emergency Planning for Norm Violations in Reinforcement Learning

Authors: Sebastian Adam, Thomas Eiter

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we showcase the practical applicability of the framework in multiple domains, including deterministic and nondeterministic environments. For evaluation, we compare the norm compliance of using a RL policy π (which is unaware of additional norms) against the use of π in combination with our framework. In particular, we are interested in the effect of the horizon k on norm violations.
Researcher Affiliation	Academia	Institute of Logic and Computation, Vienna University of Technology Favoritenstraße 9-11, A-1040 Vienna, Austria EMAIL
Pseudocode	Yes	Listing 1: Excerpt of Pgen 1 plant(4,4). 2 player(0,2,6). 3 p_n(T)\|p_s(T)\|p_w(T)\|p_e(T) :T=1..k. 4 player(T,C ,R) :player(T-1,C,R), p_w(T), C =C-1. ... 8 : player(T,C,R), plant(C,R). [1@2] 9 #maximize {R@1,T : reward(R,A,T)}. Listing 2: Excerpt of Pcheck 10 frog(0,6,2). 11 f_n(T)\|f_s(T)\|f_w(T)\|f_e(T) :T=1..k. 12 frog(T,C ,R) :frog(T-1,C,R), f_w(T), C =C-1. ... 16 ok(0). 17 ok(T) :player(T,C,R), frog(T,C ,R ), C !=C, ok(T-1). ... 19 :not sat. 20 sat :ok(k). 21 f_n(T) :T = 1..k, sat. ... 25 sat :frog(T,C,R), wall(C,R). 26 good(T) \| bad(T) :T=1..k. 27 ok(T) :bad(T). 28 #minimize {1@3,T : bad(T)}.
Open Source Code	Yes	Git https://github.com/S3basuchian/emergency-planning
Open Datasets	Yes	We extended the Berkeley AI code for Pacman (De Nero, Klein, and Abbeel 2014) and use associated instances: small Classic (grid size 20 7; 2 ghosts), medium Classic (20 11;2) and original Classic (28 27;4).
Dataset Splits	No	The paper describes generating '80 instances of Gardener' and testing '1000 times' for Pacman, but does not provide specific train/test/validation dataset splits (e.g., percentages, sample counts, or explicit split files) for its experiments.
Hardware Specification	Yes	The experiments were run on a Linux server with two Intel Xeon CPU E5-2650 v4 (12 cores @ 2.20GHz, no hyperthreading) and 256GB RAM.
Software Dependencies	Yes	We use the ASP solver clingo (version 5.7.1) to solve the ASP program of the framework. Further code and the RL implementations are in Python. Using the sota CCN simulator ndn SIM 2.9 (Mastorakis, Afanasyev, and Zhang 2017), we simulate this network for 30 secs on 5 instances differing in type and frequency of consumer requests while observing the cache (size 18) of a router that is a bottleneck to all traffic.
Experiment Setup	Yes	We use utility-based policy fixing to prevent that the agent gets stuck by unsolvability (i.e., finds no strict k-policy fix) and ease achieving the goal. We use the ASP solver clingo (version 5.7.1) to solve the ASP program of the framework. Further code and the RL implementations are in Python. The experiments were run on a Linux server with two Intel Xeon CPU E5-2650 v4 (12 cores @ 2.20GHz, no hyperthreading) and 256GB RAM. In the policy fix, the penalty for killing a living entity is higher than any action reward. Plans admitting repeated states are penalized to discourage looping in norm-compliant trajectories. An extra reward for reaching the target is given, and rewards for obeying policy preference inrease on cell revisits. Earlier violations and norm adherence are weighted higher to discourage immediate violations or policy deviations.