Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
ASP-Driven Emergency Planning for Norm Violations in Reinforcement Learning
Authors: Sebastian Adam, Thomas Eiter
AAAI 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we showcase the practical applicability of the framework in multiple domains, including deterministic and nondeterministic environments. For evaluation, we compare the norm compliance of using a RL policy π (which is unaware of additional norms) against the use of π in combination with our framework. In particular, we are interested in the effect of the horizon k on norm violations. |
| Researcher Affiliation | Academia | Institute of Logic and Computation, Vienna University of Technology Favoritenstraße 9-11, A-1040 Vienna, Austria EMAIL |
| Pseudocode | Yes | Listing 1: Excerpt of Pgen 1 plant(4,4). 2 player(0,2,6). 3 p_n(T)|p_s(T)|p_w(T)|p_e(T) :T=1..k. 4 player(T,C ,R) :player(T-1,C,R), p_w(T), C =C-1. ... 8 : player(T,C,R), plant(C,R). [1@2] 9 #maximize {R@1,T : reward(R,A,T)}. Listing 2: Excerpt of Pcheck 10 frog(0,6,2). 11 f_n(T)|f_s(T)|f_w(T)|f_e(T) :T=1..k. 12 frog(T,C ,R) :frog(T-1,C,R), f_w(T), C =C-1. ... 16 ok(0). 17 ok(T) :player(T,C,R), frog(T,C ,R ), C !=C, ok(T-1). ... 19 :not sat. 20 sat :ok(k). 21 f_n(T) :T = 1..k, sat. ... 25 sat :frog(T,C,R), wall(C,R). 26 good(T) | bad(T) :T=1..k. 27 ok(T) :bad(T). 28 #minimize {1@3,T : bad(T)}. |
| Open Source Code | Yes | Git https://github.com/S3basuchian/emergency-planning |
| Open Datasets | Yes | We extended the Berkeley AI code for Pacman (De Nero, Klein, and Abbeel 2014) and use associated instances: small Classic (grid size 20 7; 2 ghosts), medium Classic (20 11;2) and original Classic (28 27;4). |
| Dataset Splits | No | The paper describes generating '80 instances of Gardener' and testing '1000 times' for Pacman, but does not provide specific train/test/validation dataset splits (e.g., percentages, sample counts, or explicit split files) for its experiments. |
| Hardware Specification | Yes | The experiments were run on a Linux server with two Intel Xeon CPU E5-2650 v4 (12 cores @ 2.20GHz, no hyperthreading) and 256GB RAM. |
| Software Dependencies | Yes | We use the ASP solver clingo (version 5.7.1) to solve the ASP program of the framework. Further code and the RL implementations are in Python. Using the sota CCN simulator ndn SIM 2.9 (Mastorakis, Afanasyev, and Zhang 2017), we simulate this network for 30 secs on 5 instances differing in type and frequency of consumer requests while observing the cache (size 18) of a router that is a bottleneck to all traffic. |
| Experiment Setup | Yes | We use utility-based policy fixing to prevent that the agent gets stuck by unsolvability (i.e., finds no strict k-policy fix) and ease achieving the goal. We use the ASP solver clingo (version 5.7.1) to solve the ASP program of the framework. Further code and the RL implementations are in Python. The experiments were run on a Linux server with two Intel Xeon CPU E5-2650 v4 (12 cores @ 2.20GHz, no hyperthreading) and 256GB RAM. In the policy fix, the penalty for killing a living entity is higher than any action reward. Plans admitting repeated states are penalized to discourage looping in norm-compliant trajectories. An extra reward for reaching the target is given, and rewards for obeying policy preference inrease on cell revisits. Earlier violations and norm adherence are weighted higher to discourage immediate violations or policy deviations. |