reproducibilityindex.ai

A Formal Metareasoning Model of Concurrent Planning and Execution

Authors: Amihay Elboher, Ava Bensoussan, Erez Karpas, Wheeler Ruml, Shahaf S. Shperberg, Eyal Shimony

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	7 Empirical Evaluation Our experimental setting is inspired by movies such as Indiana Jones or Die Hard in which the hero is required to solve a puzzle before a deadline or suffer extreme consequences. As the water jugs problem from Die Hard is too easy, we use the 15-puzzle with the Manhattan distance heuristic instead. We collected data by solving 10,000 15-puzzle instances, recording the number of expansions required by A to find an optimal solution from each initial state, as well as the actual solution length.
Researcher Affiliation	Academia	Amihay Elboher1, Ava Bensoussan1, Erez Karpas2, Wheeler Ruml3, Shahaf S. Shperberg1, Eyal Shimony1 1Ben-Gurion University, Israel 2Technion, Israel 3University of New Hampshire, USA
Pseudocode	Yes	Algorithm 1: Max-LETA, Algorithm 2: K-Bounded A, Algorithm 3: reduce cope to sae2, Algorithm 4: schedule actions, Algorithm 5: Demand-Execution SAE2 Alg
Open Source Code	Yes	The implementation can be found in the following repository: https://github.com/amihayelboher/Co PE
Open Datasets	No	The paper uses the 15-puzzle problem and collects its own data by solving 10,000 instances to generate Co PE problems. While the 15-puzzle is a well-known benchmark, the paper does not provide a link or specific citation to a pre-existing publicly available dataset that was downloaded or used directly for training/evaluation purposes. The data used for experiments was generated by the authors themselves.
Dataset Splits	No	The paper describes running algorithms and simulating outcomes by sampling from distributions but does not specify traditional training, validation, and test dataset splits as commonly found in machine learning experiments. It evaluates the algorithms on generated Co PE instances rather than models trained on split data.
Hardware Specification	No	The paper does not provide specific details regarding the hardware (e.g., CPU, GPU models, memory) used to run the experiments. It mentions runtime preferences for integration into temporal planners but not the actual hardware used for their empirical evaluation.
Software Dependencies	No	The paper mentions tools and algorithms like OPTIC, UCT, and UCB1, but does not provide specific version numbers for any software dependencies used in their implementation or experimental setup within the text.
Experiment Setup	Yes	In this setting, all base-level actions require the same amount of time units to be completed, denoted as dur(b); in our experiments, we considered dur(b) {1, 2, 3} (i.e. each 15-puzzle instance became three Co PE instances, differing only in the duration of the base-level action). ... Finally, to make the deadlines challenging, we used as the deadline for reaching the goal Xi = 4 h(i). ... MCTS with an exploration constant c = 2 and budgets of 10, 100 and 500 rollouts before selecting each time allocation.