Warm-up Free Policy Optimization: Improved Regret in Linear Markov Decision Processes
Authors: Asaf Cassel, Aviv Rosenberg
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | This paper eliminates this undesired warm-up phase, replacing it with a simple and efficient contraction mechanism. Our PO algorithm achieves rate-optimal regret with improved dependence on the other parameters of the problem (horizon and function approximation dimension) in two fundamental settings: adversarial losses with full-information feedback and stochastic losses with bandit feedback. |
| Researcher Affiliation | Collaboration | Asaf Cassel Tel Aviv University acassel@mail.tau.ac.il Aviv Rosenberg Google Research avivros@google.com |
| Pseudocode | Yes | Algorithm 1 Contracted Features PO for linear MDPs |
| Open Source Code | No | The paper does not contain any explicit statement about releasing open-source code for the described methodology, nor does it provide a link to a code repository. |
| Open Datasets | No | This paper focuses on theoretical contributions and does not involve empirical studies with specific datasets. |
| Dataset Splits | No | This paper is theoretical and does not describe any experimental validation process involving dataset splits. |
| Hardware Specification | No | The paper is theoretical and does not describe any experimental setup or the specific hardware used to run experiments. |
| Software Dependencies | No | The paper is theoretical and does not mention any specific software dependencies with version numbers for experimental reproducibility. |
| Experiment Setup | No | The paper is theoretical and does not detail an empirical experimental setup with hyperparameters or training configurations. |