Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Counting and Reasoning with Plans

Authors: David Speck, Markus Hecher, Daniel Gnad, Johannes K. Fichte, Augusto B. Corrêa

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical Evaluation We implemented our reasoning framework for classical planning as a tool called Planalyst. Therefore, we transform planning tasks into SAT formulas based on the Madagascar planner (Rintanen 2011, 2014). To efficiently carry out counting, we use d4 (Lagniez and Marquis 2017; Audemard, Lagniez, and Miceli 2022), which compiles (potentially large) formulas into a specialized normal form called d-DNNF (Darwiche and Marquis 2002), enabling fast reasoning. Finally, we reason over the plan space via counting queries using the ddnnife reasoner (Sundermann et al. 2024), which works in poly-time on d-DNNFs.
Researcher Affiliation Academia 1University of Basel, Switzerland 2Univ. Artois, CNRS, UMR 8188, Centre de Recherche en Informatique de Lens (CRIL), F-62300 Lens, France 3CSAIL, Massachusetts Institute of Technology, United States 4Link oping University, Sweden 5University of Oxford, United Kingdom EMAIL, EMAIL, EMAIL, EMAIL
Pseudocode No The paper describes theoretical concepts, defines problems, and presents empirical results but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes Source code, benchmarks, and data are available online (Speck et al. 2024). Speck, D.; Hecher, M.; Gnad, D.; Fichte, J. K.; and Corrˆea, A. B. 2024. Code, benchmarks and data for the AAAI 2025 paper Counting and Reasoning with Plans . https://doi.org/10.5281/ zenodo.14499686.
Open Datasets Yes Our benchmarks include all optimal planning domains from IPCs 1998-2023 with unit operator costs and without conditional effects or axioms. Source code, benchmarks, and data are available online (Speck et al. 2024).
Dataset Splits No The paper uses benchmark sets from IPCs 1998-2023 and defines an upper bound for plan length for each task. It does not describe explicit training/test/validation splits for these benchmark tasks in the traditional sense of machine learning datasets.
Hardware Specification Yes All experiments ran on Intel Xeon Silver 4114 processors running at 2.2 GHz. We used a time limit of 30 minutes and a memory limit of 6 Gi B per task.
Software Dependencies No The paper mentions using "Madagascar planner," "d4," and "ddnnife reasoner" but does not specify their version numbers.
Experiment Setup Yes In the experiments, we count plans of length up to a multiplicative factor c {1.0, 1.1, 1.2, 1.3, 1.4, 1.5} of the collected upper bounds. We ran both baseline planners in their recommended configurations2: K , which implements orbit-space search (Katz and Lee 2023) with the landmark-cut heuristic (Helmert and Domshlak 2009), and Sym K, which implements a variant of bidirectional symbolic search (Torralba et al. 2017). We used a time limit of 30 minutes and a memory limit of 6 Gi B per task.