reproducibilityindex.ai

Online Planning with Lookahead Policies

Authors: Yonathan Efroni, Mohammad Ghavamzadeh, Shie Mannor

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	In this we devise a multi-step greedy RTDP algorithm, which we call h-RTDP, that replaces the 1-step greedy policy with a h-step lookahead policy. We analyze h-RTDP in its exact form and establish that increasing the lookahead horizon, h, results in an improved sample complexity, with the cost of additional computations. This is the ﬁrst work that proves improved sample complexity as a result of increasing the lookahead horizon in online planning. We then analyze the performance of h-RTDP in three approximate settings: approximate model, approximate value updates, and approximate state representation. For these cases, we prove that the asymptotic performance of h-RTDP remains the same as that of a corresponding approximate DP algorithm, the best one can hope for without further assumptions on the approximation errors.
Researcher Affiliation	Collaboration	Yonathan Efroni Mohammad Ghavamzadeh Shie Mannor Part of this work was done during an internship in Facebook AI Research Microsoft Research, New York, NY Technion, Israel Google Research Nvidia Research
Pseudocode	Yes	Algorithm 1 Real-Time DP (RTDP) and Algorithm 2 RTDP with Lookahead (h-RTDP)
Open Source Code	No	The paper does not provide any explicit statements about releasing source code or links to a code repository for the methodology described.
Open Datasets	No	The paper is theoretical and focuses on algorithm analysis and proofs; it does not conduct experiments with datasets and therefore does not mention public dataset availability or provide access information.
Dataset Splits	No	The paper is theoretical and focuses on algorithm analysis and proofs; it does not conduct empirical experiments with datasets and therefore does not specify training/test/validation dataset splits.
Hardware Specification	No	The paper is theoretical and focuses on algorithm analysis and proofs; it does not conduct empirical experiments and therefore does not specify hardware used.
Software Dependencies	No	The paper is theoretical and focuses on algorithm analysis and proofs; it does not conduct empirical experiments and therefore does not provide specific software dependencies with version numbers.
Experiment Setup	No	The paper is theoretical and focuses on algorithm analysis and proofs; it does not describe an experimental setup with specific hyperparameters, training configurations, or system-level settings for empirical runs.