Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Online Planning with Lookahead Policies
Authors: Yonathan Efroni, Mohammad Ghavamzadeh, Shie Mannor
NeurIPS 2020 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | In this we devise a multi-step greedy RTDP algorithm, which we call h-RTDP, that replaces the 1-step greedy policy with a h-step lookahead policy. We analyze h-RTDP in its exact form and establish that increasing the lookahead horizon, h, results in an improved sample complexity, with the cost of additional computations. This is the ο¬rst work that proves improved sample complexity as a result of increasing the lookahead horizon in online planning. We then analyze the performance of h-RTDP in three approximate settings: approximate model, approximate value updates, and approximate state representation. For these cases, we prove that the asymptotic performance of h-RTDP remains the same as that of a corresponding approximate DP algorithm, the best one can hope for without further assumptions on the approximation errors. |
| Researcher Affiliation | Collaboration | Yonathan Efroni Mohammad Ghavamzadeh Shie Mannor Part of this work was done during an internship in Facebook AI Research Microsoft Research, New York, NY Technion, Israel Google Research Nvidia Research |
| Pseudocode | Yes | Algorithm 1 Real-Time DP (RTDP) and Algorithm 2 RTDP with Lookahead (h-RTDP) |
| Open Source Code | No | The paper does not provide any explicit statements about releasing source code or links to a code repository for the methodology described. |
| Open Datasets | No | The paper is theoretical and focuses on algorithm analysis and proofs; it does not conduct experiments with datasets and therefore does not mention public dataset availability or provide access information. |
| Dataset Splits | No | The paper is theoretical and focuses on algorithm analysis and proofs; it does not conduct empirical experiments with datasets and therefore does not specify training/test/validation dataset splits. |
| Hardware Specification | No | The paper is theoretical and focuses on algorithm analysis and proofs; it does not conduct empirical experiments and therefore does not specify hardware used. |
| Software Dependencies | No | The paper is theoretical and focuses on algorithm analysis and proofs; it does not conduct empirical experiments and therefore does not provide specific software dependencies with version numbers. |
| Experiment Setup | No | The paper is theoretical and focuses on algorithm analysis and proofs; it does not describe an experimental setup with specific hyperparameters, training configurations, or system-level settings for empirical runs. |