Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Planning for Hybrid Systems via Satisfiability Modulo Theories
Authors: Michael Cashmore, Daniele Magazzeni, Parisa Zehtabi
JAIR 2020 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section we present our experimental results in four parts. |
| Researcher Affiliation | Academia | Michael Cashmore EMAIL University of Strathclyde 26 Richmond Street, G1 1XH, Glasgow, UK Daniele Magazzeni EMAIL Parisa Zehtabi EMAIL King s College London Bush House, WC2B 4BG, London, UK |
| Pseudocode | Yes | Algorithm 1: SMTPlan |
| Open Source Code | Yes | 1. The source code and documentation for SMTPlan can be obtained from https://github.com/KCL-Planning/SMTPlan |
| Open Datasets | Yes | All test domains and problems are available at: kcl-planning.github.io/SMTPlan. |
| Dataset Splits | No | The paper evaluates its planner on benchmark problems and problem instances, but the concept of training/test/validation dataset splits, as typically understood in machine learning, is not applicable to the experimental methodology of solving planning problems. The paper does not describe how to split a dataset for training and testing its planner. |
| Hardware Specification | No | The experiments were run using 8GB of RAM and a 30 minute timeout. |
| Software Dependencies | Yes | In the current implementation of SMTPlan the CAS piranha is used, as it is able to solve polynomial non-linear integrals. ... The SMT solver we use is z3 (De Moura & Bjรธrner, 2008). ... (Biscani et al. (2018, May). bluescarni/piranha: piranha 0.11.) |
| Experiment Setup | Yes | For all experiments we use an initial happening bound of 2 (assuming the goal to not hold in the intial state) a step size of 1, and a bound on the length of the event cascade of 2. The bound on the event cascade is known to be the longest cycle-free chain of events possible across all of the benchmark domains. |