Optimistic Planning by Regularized Dynamic Programming
Authors: Antoine Moulin, Gergely Neu
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We propose a new method for optimistic planning in infinite-horizon discounted Markov decision processes... We use our method to recover known guarantees in tabular MDPs and to provide a computationally efficient algorithm for learning near-optimal policies in discounted linear mixture MDPs from a single stream of experience, and show it achieves near-optimal statistical guarantees. |
| Researcher Affiliation | Academia | 1Universitat Pompeu Fabra, Barcelona, Spain. Correspondence to: Antoine Moulin <antoine.moulin@upf.edu>, Gergely Neu <gergely.neu@gmail.com>. |
| Pseudocode | Yes | The overall procedure is presented as Algorithm 1. ... Algorithm 2 RAVI-UCB for tabular MDPs. ... Algorithm 3 RAVI-UCB for linear mixture MDPs. |
| Open Source Code | No | The paper does not provide any statement or link indicating that the source code for the described methodology is openly available. |
| Open Datasets | No | The paper is theoretical and does not conduct experiments involving datasets or their public availability. The term 'train' is not used in the context of empirical data training. |
| Dataset Splits | No | The paper is theoretical and does not conduct experiments with dataset splits. The term 'validation' is used once in 'validation of existing assumptions' which is not related to data splitting. |
| Hardware Specification | No | The paper is theoretical and does not describe empirical experiments, therefore no hardware specifications are mentioned. |
| Software Dependencies | No | The paper is theoretical and does not describe empirical experiments, therefore no specific software dependencies with version numbers are mentioned. |
| Experiment Setup | No | The paper is theoretical and does not describe empirical experiments with specific hyperparameter values or training configurations. The 'setup' discussed refers to the theoretical model setup. |