Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Bounding Quality in Diverse Planning

Authors: Michael Katz, Shirin Sohrabi, Octavian Udrea9805-9812

AAAI 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental Evaluation To empirically evaluate the feasibility of our suggested approach, we have implemented our diverse planners on top of the Diversity Score Computation component (Katz and Sohrabi 2019), using CPLEX v12.8.0 for solving the mixed integer linear programs.
Researcher Affiliation	Industry	1 IBM Research, Yorktown Heights, NY, USA 2 Dataminr, New York, NY, USA
Pseudocode	No	The paper describes mixed integer linear program formulations, but does not present them as structured pseudocode or an algorithm block labeled 'Algorithm'.
Open Source Code	Yes	The code is available at https://github.com/IBM/diversescore.
Open Datasets	Yes	The benchmark set consists of all STRIPS benchmarks from optimal tracks of International Planning Competitions (IPC) 1998-2018, a total of 1797 tasks in 64 domains.
Dataset Splits	No	The paper describes using a 'benchmark set' from International Planning Competitions, but does not specify explicit training, validation, or test dataset splits (e.g., percentages or sample counts).
Hardware Specification	Yes	The experiments were performed on Intel(R) Xeon(R) CPU E7-8837 @2.67GHz machines, with the time and memory limit of 30min and 2GB, respectively.
Software Dependencies	Yes	using CPLEX v12.8.0 for solving the mixed integer linear programs.
Experiment Setup	Yes	We run these planners with a 29min time bound, to allow at least one minute for the second step. In all cases, the overall time bound for both steps is 30min. Further, to avoid generating a larger amount of plans, the overall bound on the number of generated plans for the first step is set to 10000.