Online Planning with Lookahead Policies

Authors: Yonathan Efroni, Mohammad Ghavamzadeh, Shie Mannor

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical In this we devise a multi-step greedy RTDP algorithm, which we call h-RTDP, that replaces the 1-step greedy policy with a h-step lookahead policy. We analyze h-RTDP in its exact form and establish that increasing the lookahead horizon, h, results in an improved sample complexity, with the cost of additional computations. This is the first work that proves improved sample complexity as a result of increasing the lookahead horizon in online planning. We then analyze the performance of h-RTDP in three approximate settings: approximate model, approximate value updates, and approximate state representation. For these cases, we prove that the asymptotic performance of h-RTDP remains the same as that of a corresponding approximate DP algorithm, the best one can hope for without further assumptions on the approximation errors.
Researcher Affiliation Collaboration Yonathan Efroni Mohammad Ghavamzadeh Shie Mannor Part of this work was done during an internship in Facebook AI Research Microsoft Research, New York, NY Technion, Israel Google Research Nvidia Research
Pseudocode Yes Algorithm 1 Real-Time DP (RTDP) and Algorithm 2 RTDP with Lookahead (h-RTDP)
Open Source Code No The paper does not provide any explicit statements about releasing source code or links to a code repository for the methodology described.
Open Datasets No The paper is theoretical and focuses on algorithm analysis and proofs; it does not conduct experiments with datasets and therefore does not mention public dataset availability or provide access information.
Dataset Splits No The paper is theoretical and focuses on algorithm analysis and proofs; it does not conduct empirical experiments with datasets and therefore does not specify training/test/validation dataset splits.
Hardware Specification No The paper is theoretical and focuses on algorithm analysis and proofs; it does not conduct empirical experiments and therefore does not specify hardware used.
Software Dependencies No The paper is theoretical and focuses on algorithm analysis and proofs; it does not conduct empirical experiments and therefore does not provide specific software dependencies with version numbers.
Experiment Setup No The paper is theoretical and focuses on algorithm analysis and proofs; it does not describe an experimental setup with specific hyperparameters, training configurations, or system-level settings for empirical runs.