Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Making Translations to Classical Planning Competitive with Other HTN Planners
Authors: Gregor Behnke, Florian Pollitt, Daniel Hรถller, Pascal Bercher, Ron Alford9687-9697
AAAI 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical Evaluation We have compared our proposed encodings, which we call HTN2SAS2 with a wide variety of other planners, including the IPC 2020 competitors, on the IPC 2020 benchmark set (Behnke, H oller, and Bercher 2021). In Tab. 1, we show standard and normalized coverage and the IPC score of the several TO-HTN planners on the IPC 2020 benchmark set. |
| Researcher Affiliation | Collaboration | 1University of Freiburg, Germany, 2ILLC, University of Amsterdam, The Netherlands, 3Saarland University, Saarland Informatics Campus, Saarbr ucken, Germany, 4The Australian National University, College of Engineering & Computer Science, Canberra, Australia, 5MITRE, Mc Lean, VA, USA |
| Pseudocode | No | The paper describes the encodings and methods using mathematical notation and descriptive text, but it does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | The code is integrated into the panda PI system and can be found at https://github.com/panda-planner-dev/panda PIengine. |
| Open Datasets | Yes | We have compared our proposed encodings, which we call HTN2SAS2 with a wide variety of other planners, including the IPC 2020 competitors, on the IPC 2020 benchmark set (Behnke, H oller, and Bercher 2021). |
| Dataset Splits | No | The paper mentions using the IPC 2020 benchmark set and discusses train, validation, and test in the context of classical planning, but it does not provide specific details on how the dataset was split into training, validation, and test sets for their experiments, such as percentages, sample counts, or specific splitting methodology. |
| Hardware Specification | Yes | Each planner was given 8Gi B of RAM and 30 minutes of single-core runtime on an Xeon Gold 6242 CPU per instance. |
| Software Dependencies | No | The paper mentions various planners and systems used (e.g., 'Fast Downward', 'Saarplan', 'panda PI system') and cites their corresponding papers, but it does not provide specific version numbers for these software components or other ancillary software dependencies like programming languages or libraries. |
| Experiment Setup | Yes | Each planner was given 8Gi B of RAM and 30 minutes of single-core runtime on an Xeon Gold 6242 CPU per instance. Runtime includes the time for grounding, encoding, and solving by the back-end planner. We found that the best performing back-end was Fast Downward (Helmert 2006), first performing enforced hill climbing and then lazy greedy search, both with the FF heuristic (Hoffmann and Nebel 2001) and FF preferred operators. |