Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Lilotane: A Lifted SAT-based Approach to Hierarchical Planning

Authors: Dominik Schreiber

JAIR 2021 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical evaluations conﬁrm that Lilotane outperforms established SAT-based approaches, often by orders of magnitude, produces much smaller formulae on average, and compares favorably to other state-of-the-art HTN planners regarding robustness and plan quality. In the International Planning Competition (IPC) 2020, a preliminary version of Lilotane scored the second place.
Researcher Affiliation	Academia	Dominik Schreiber EMAIL Karlsruhe Institute of Technology, Kaiserstraße 12 76131 Karlsruhe, Germany
Pseudocode	Yes	Algorithm 1: Lilotane Planning Procedure
Open Source Code	Yes	Our source code is available at www.github.com/domschrei/lilotane and all experimental data is available at www.github.com/domschrei/lilotane-experimental-data.
Open Datasets	Yes	The IPC was based on an exceptionally large and diverse set of benchmarks for hierarchical planning... Table 3 lists averaged properties of old and new benchmarks in accordance with our complexity model... The Factories HTN domain. In Proceedings of the 2020 International Planning Competition (IPC). To appear.
Dataset Splits	No	The paper evaluates planning performance on problem instances from benchmarks (e.g., IPC 2020 benchmarks). It does not involve machine learning-style training, validation, or test splits of a dataset for model development. Instead, the evaluation is performed on a collection of problem instances.
Hardware Specification	Yes	The experiments have been conducted on a desktop PC running Ubuntu 18.04 with a quad-core Intel i7-6700 processor clocked at 3.40GHz and with 32GB of DDR4 RAM. ... The evaluations were conducted on an server with an AMD EPYC 7702P 64-Core processor (plus hyperthreading) clocked between 2.0 and 3.35 GHz with 1024 GB of DDR4 RAM, running Ubuntu 20.04.
Software Dependencies	No	We have implemented our approach in C++17. Our source code is available at www.github.com/domschrei/lilotane... We make use of panda PIparser (Behnke et al., 2020)... We used the Re-entrant Incremental SAT solver API (IPASIR, see Balyo, Biere, Iser, & Sinz, 2016) and link our software with a SAT solver. As was the case for Tree-REX, we found Glucose (Audemard & Simon, 2009) to empirically work best among various solvers... We use PANDA in conjunction with SAT solver Cryptominisat (Soos, Nohl, & Castelluccia, 2009)... Although several software tools are mentioned, specific version numbers (e.g., for Glucose or Cryptominisat) are not explicitly provided, only their defining publications.
Experiment Setup	Yes	We set a timeout of ﬁve minutes and a memory limit of 8GB. The experiments have been conducted on a desktop PC... The runs were performed sequentially... We executed up to 63 runs in parallel and set a time limit of 30 minutes and a memory limit of 8GB as in the IPC.