Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Online Planning with Lookahead Policies
Authors: Yonathan Efroni, Mohammad Ghavamzadeh, Shie Mannor
NeurIPS 2020 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | In this we devise a multi-step greedy RTDP algorithm, which we call h-RTDP, that replaces the 1-step greedy policy with a h-step lookahead policy. We analyze h-RTDP in its exact form and establish that increasing the lookahead horizon, h, results in an improved sample complexity, with the cost of additional computations. This is the ο¬rst work that proves improved sample complexity as a result of increasing the lookahead horizon in online planning. We then analyze the performance of h-RTDP in three approximate settings: approximate model, approximate value updates, and approximate state representation. For these cases, we prove that the asymptotic performance of h-RTDP remains the same as that of a corresponding approximate DP algorithm, the best one can hope for without further assumptions on the approximation errors. |
| Researcher Affiliation | Collaboration | Yonathan Efroni Mohammad Ghavamzadeh Shie Mannor Part of this work was done during an internship in Facebook AI Research Microsoft Research, New York, NY Technion, Israel Google Research Nvidia Research |
| Pseudocode | Yes | Algorithm 1 Real-Time DP (RTDP) and Algorithm 2 RTDP with Lookahead (h-RTDP) |
| Open Source Code | No | The paper does not provide any explicit statements about releasing source code or links to a code repository for the methodology described. |
| Open Datasets | No | The paper is theoretical and focuses on algorithm analysis and proofs; it does not conduct experiments with datasets and therefore does not mention public dataset availability or provide access information. |
| Dataset Splits | No | The paper is theoretical and focuses on algorithm analysis and proofs; it does not conduct empirical experiments with datasets and therefore does not specify training/test/validation dataset splits. |
| Hardware Specification | No | The paper is theoretical and focuses on algorithm analysis and proofs; it does not conduct empirical experiments and therefore does not specify hardware used. |
| Software Dependencies | No | The paper is theoretical and focuses on algorithm analysis and proofs; it does not conduct empirical experiments and therefore does not provide specific software dependencies with version numbers. |
| Experiment Setup | No | The paper is theoretical and focuses on algorithm analysis and proofs; it does not describe an experimental setup with specific hyperparameters, training configurations, or system-level settings for empirical runs. |