Online Planning with Lookahead Policies
Authors: Yonathan Efroni, Mohammad Ghavamzadeh, Shie Mannor
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | In this we devise a multi-step greedy RTDP algorithm, which we call h-RTDP, that replaces the 1-step greedy policy with a h-step lookahead policy. We analyze h-RTDP in its exact form and establish that increasing the lookahead horizon, h, results in an improved sample complexity, with the cost of additional computations. This is the first work that proves improved sample complexity as a result of increasing the lookahead horizon in online planning. We then analyze the performance of h-RTDP in three approximate settings: approximate model, approximate value updates, and approximate state representation. For these cases, we prove that the asymptotic performance of h-RTDP remains the same as that of a corresponding approximate DP algorithm, the best one can hope for without further assumptions on the approximation errors. |
| Researcher Affiliation | Collaboration | Yonathan Efroni Mohammad Ghavamzadeh Shie Mannor Part of this work was done during an internship in Facebook AI Research Microsoft Research, New York, NY Technion, Israel Google Research Nvidia Research |
| Pseudocode | Yes | Algorithm 1 Real-Time DP (RTDP) and Algorithm 2 RTDP with Lookahead (h-RTDP) |
| Open Source Code | No | The paper does not provide any explicit statements about releasing source code or links to a code repository for the methodology described. |
| Open Datasets | No | The paper is theoretical and focuses on algorithm analysis and proofs; it does not conduct experiments with datasets and therefore does not mention public dataset availability or provide access information. |
| Dataset Splits | No | The paper is theoretical and focuses on algorithm analysis and proofs; it does not conduct empirical experiments with datasets and therefore does not specify training/test/validation dataset splits. |
| Hardware Specification | No | The paper is theoretical and focuses on algorithm analysis and proofs; it does not conduct empirical experiments and therefore does not specify hardware used. |
| Software Dependencies | No | The paper is theoretical and focuses on algorithm analysis and proofs; it does not conduct empirical experiments and therefore does not provide specific software dependencies with version numbers. |
| Experiment Setup | No | The paper is theoretical and focuses on algorithm analysis and proofs; it does not describe an experimental setup with specific hyperparameters, training configurations, or system-level settings for empirical runs. |