Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Planning for Hybrid Systems via Satisfiability Modulo Theories

Authors: Michael Cashmore, Daniele Magazzeni, Parisa Zehtabi

JAIR 2020 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section we present our experimental results in four parts.
Researcher Affiliation	Academia	Michael Cashmore EMAIL University of Strathclyde 26 Richmond Street, G1 1XH, Glasgow, UK Daniele Magazzeni EMAIL Parisa Zehtabi EMAIL King s College London Bush House, WC2B 4BG, London, UK
Pseudocode	Yes	Algorithm 1: SMTPlan
Open Source Code	Yes	1. The source code and documentation for SMTPlan can be obtained from https://github.com/KCL-Planning/SMTPlan
Open Datasets	Yes	All test domains and problems are available at: kcl-planning.github.io/SMTPlan.
Dataset Splits	No	The paper evaluates its planner on benchmark problems and problem instances, but the concept of training/test/validation dataset splits, as typically understood in machine learning, is not applicable to the experimental methodology of solving planning problems. The paper does not describe how to split a dataset for training and testing its planner.
Hardware Specification	No	The experiments were run using 8GB of RAM and a 30 minute timeout.
Software Dependencies	Yes	In the current implementation of SMTPlan the CAS piranha is used, as it is able to solve polynomial non-linear integrals. ... The SMT solver we use is z3 (De Moura & Bjørner, 2008). ... (Biscani et al. (2018, May). bluescarni/piranha: piranha 0.11.)
Experiment Setup	Yes	For all experiments we use an initial happening bound of 2 (assuming the goal to not hold in the intial state) a step size of 1, and a bound on the length of the event cascade of 2. The bound on the event cascade is known to be the longest cycle-free chain of events possible across all of the benchmark domains.