Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Planning for Hybrid Systems via Satisfiability Modulo Theories

Authors: Michael Cashmore, Daniele Magazzeni, Parisa Zehtabi

JAIR 2020 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section we present our experimental results in four parts.
Researcher Affiliation Academia Michael Cashmore EMAIL University of Strathclyde 26 Richmond Street, G1 1XH, Glasgow, UK Daniele Magazzeni EMAIL Parisa Zehtabi EMAIL King s College London Bush House, WC2B 4BG, London, UK
Pseudocode Yes Algorithm 1: SMTPlan
Open Source Code Yes 1. The source code and documentation for SMTPlan can be obtained from https://github.com/KCL-Planning/SMTPlan
Open Datasets Yes All test domains and problems are available at: kcl-planning.github.io/SMTPlan.
Dataset Splits No The paper evaluates its planner on benchmark problems and problem instances, but the concept of training/test/validation dataset splits, as typically understood in machine learning, is not applicable to the experimental methodology of solving planning problems. The paper does not describe how to split a dataset for training and testing its planner.
Hardware Specification No The experiments were run using 8GB of RAM and a 30 minute timeout.
Software Dependencies Yes In the current implementation of SMTPlan the CAS piranha is used, as it is able to solve polynomial non-linear integrals. ... The SMT solver we use is z3 (De Moura & Bjรธrner, 2008). ... (Biscani et al. (2018, May). bluescarni/piranha: piranha 0.11.)
Experiment Setup Yes For all experiments we use an initial happening bound of 2 (assuming the goal to not hold in the intial state) a step size of 1, and a bound on the length of the event cascade of 2. The bound on the event cascade is known to be the longest cycle-free chain of events possible across all of the benchmark domains.