Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Bounding Quality in Diverse Planning

Authors: Michael Katz, Shirin Sohrabi, Octavian Udrea9805-9812

AAAI 2022 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental Evaluation To empirically evaluate the feasibility of our suggested approach, we have implemented our diverse planners on top of the Diversity Score Computation component (Katz and Sohrabi 2019), using CPLEX v12.8.0 for solving the mixed integer linear programs.
Researcher Affiliation Industry 1 IBM Research, Yorktown Heights, NY, USA 2 Dataminr, New York, NY, USA
Pseudocode No The paper describes mixed integer linear program formulations, but does not present them as structured pseudocode or an algorithm block labeled 'Algorithm'.
Open Source Code Yes The code is available at https://github.com/IBM/diversescore.
Open Datasets Yes The benchmark set consists of all STRIPS benchmarks from optimal tracks of International Planning Competitions (IPC) 1998-2018, a total of 1797 tasks in 64 domains.
Dataset Splits No The paper describes using a 'benchmark set' from International Planning Competitions, but does not specify explicit training, validation, or test dataset splits (e.g., percentages or sample counts).
Hardware Specification Yes The experiments were performed on Intel(R) Xeon(R) CPU E7-8837 @2.67GHz machines, with the time and memory limit of 30min and 2GB, respectively.
Software Dependencies Yes using CPLEX v12.8.0 for solving the mixed integer linear programs.
Experiment Setup Yes We run these planners with a 29min time bound, to allow at least one minute for the second step. In all cases, the overall time bound for both steps is 30min. Further, to avoid generating a larger amount of plans, the overall bound on the number of generated plans for the first step is set to 10000.