Metareasoning for Planning Under Uncertainty

Authors: Christopher H. Lin, Andrey Kolobov, Ece Kamar, Eric Horvitz

IJCAI 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We perform a set of experiments showing the performance of these algorithms versus baselines in several synthetic domains with different properties, and characterize their performance with a measure that we call the metareasoning gap a measure of the potential for improvement from metareasoning. The experiments demonstrate that the proposed techniques excel when the metareasoning gap is large.
Researcher Affiliation Collaboration Christopher H. Lin Andrey Kolobov, Ece Kamar, Eric Horvitz University of Washington Microsoft Research Seattle, WA Redmond, WA chrislin@cs.washington.edu {akolobov,eckamar,horvitz}@microsoft.com
Pseudocode No The paper describes algorithms in text but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code No The paper does not contain any explicit statement about providing open-source code for the methodology, nor does it provide any links to a code repository.
Open Datasets No The paper describes using 'synthetic domains' and a '100x100 grid world' built by the authors, but it does not provide concrete access information (link, DOI, citation) for a publicly available or open dataset.
Dataset Splits No The paper does not provide specific dataset split information (exact percentages, sample counts, or detailed splitting methodology) for training, validation, or testing.
Hardware Specification No The paper does not provide specific hardware details (exact GPU/CPU models, processor types, or memory amounts) used for running its experiments.
Software Dependencies No The paper does not provide specific ancillary software details, such as library names with version numbers, needed to replicate the experiment.
Experiment Setup Yes We set the parameters of the domain as follows so that there is a policy that can get the agent to the goal with a small number of steps (in tens instead of hundreds) and where the winds significantly influence the number of steps needed to get to the goal: The agent can move 11 cells at a time and the wind has a pushing power of 10 cells. We vary the cost of thinking and acting between 1 and 15. Each BRTDP trajectory is 50 actions long.