Learning Generalized Policies for Fully Observable Non-Deterministic Planning Domains

Authors: Till Hofmann, Hector Geffner

IJCAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We also evaluate the resulting approach experimentally over a number of benchmark domains in FOND planning, present the general policies that result in some of these domains, and prove their correctness.3 6.1 Experimental Results We modeled and solved the min-cost SAT problem represented by the theory T(S, F) as an Answer Set Program (ASP) [Lifschitz, 2016] in clingo [Gebser et al., 2011].
Researcher Affiliation Academia Till Hofmann and Hector Geffner RWTH Aachen University till.hofmann@ml.rwth-aachen.de, hector.geffner@ml.rwth-aachen.de
Pseudocode Yes Algorithm 1 Dead-End Detection Input: FOND model M(P) = S, s0, SG, Act, A, F Output: FOND dead-end set D S 1: D ; 2: repeat 3: for all s S \ D do 4: for all a A(s) do 5: if F(a, s)  D = then 6: Remove a from A(s) 7: for all s S \ D do 8: if path s a1 . . . ak sg. ai A(si), sg SG then 9: Add s to D 10: until D does not change 11: return D
Open Source Code Yes The source code, benchmark domains, and results are available at [Hofmann and Geffner, 2024a]. The reference [Hofmann and Geffner, 2024a] is listed as: Till Hofmann and Hector Geffner. Generalized FOND planning. Zenodo, May 2024. https://doi. org/10.5281/zenodo.11171181.
Open Datasets Yes The FOND domains considered were taken from the FOND-SAT distribution [Geffner and Geffner, 2018], leaving out domains with unsupported features.
Dataset Splits Yes More precisely, starting with a singleton training set consisting of the smallest instance of P, the solver learns a new policy and iteratively tests whether the policy solves the next problem. If this validation fails, the failed instance is added to the training set and the process repeats.
Hardware Specification Yes All experiments were run on Intel Xeon Platinum 8352M CPUs with 32 threads, a memory limit of 220 GB
Software Dependencies No The paper mentions 'clingo', 'pddl', and 'DLPlan' as software used, but it does not specify version numbers for these software components. For example, 'clingo [Gebser et al., 2011]' refers to a paper from 2011, not a software version number. Similarly, 'pddl [Favorito et al., 2023]' and 'DLPlan [Drexler et al., 2022a]' refer to papers, not specific software versions.
Experiment Setup Yes As optimizations, instead of using the ranking V (s, d), we incrementally label all states where all selected transitions lead to the goal as safe and require that all alive states are also safe. Additionally, we do not try to distinguish all dead states from alive states and instead only compare alive states to critical states, which are those states that are dead-ends but have an incoming transition from an alive state. Finally, we preprocess the state space S by pruning all dead states that are not critical. with a maximal feature complexity cmax = 15.