Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Bounding Quality in Diverse Planning
Authors: Michael Katz, Shirin Sohrabi, Octavian Udrea9805-9812
AAAI 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental Evaluation To empirically evaluate the feasibility of our suggested approach, we have implemented our diverse planners on top of the Diversity Score Computation component (Katz and Sohrabi 2019), using CPLEX v12.8.0 for solving the mixed integer linear programs. |
| Researcher Affiliation | Industry | 1 IBM Research, Yorktown Heights, NY, USA 2 Dataminr, New York, NY, USA |
| Pseudocode | No | The paper describes mixed integer linear program formulations, but does not present them as structured pseudocode or an algorithm block labeled 'Algorithm'. |
| Open Source Code | Yes | The code is available at https://github.com/IBM/diversescore. |
| Open Datasets | Yes | The benchmark set consists of all STRIPS benchmarks from optimal tracks of International Planning Competitions (IPC) 1998-2018, a total of 1797 tasks in 64 domains. |
| Dataset Splits | No | The paper describes using a 'benchmark set' from International Planning Competitions, but does not specify explicit training, validation, or test dataset splits (e.g., percentages or sample counts). |
| Hardware Specification | Yes | The experiments were performed on Intel(R) Xeon(R) CPU E7-8837 @2.67GHz machines, with the time and memory limit of 30min and 2GB, respectively. |
| Software Dependencies | Yes | using CPLEX v12.8.0 for solving the mixed integer linear programs. |
| Experiment Setup | Yes | We run these planners with a 29min time bound, to allow at least one minute for the second step. In all cases, the overall time bound for both steps is 30min. Further, to avoid generating a larger amount of plans, the overall bound on the number of generated plans for the first step is set to 10000. |