Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Warm-up Free Policy Optimization: Improved Regret in Linear Markov Decision Processes
Authors: Asaf Cassel, Aviv Rosenberg
NeurIPS 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | This paper eliminates this undesired warm-up phase, replacing it with a simple and ef๏ฌcient contraction mechanism. Our PO algorithm achieves rate-optimal regret with improved dependence on the other parameters of the problem (horizon and function approximation dimension) in two fundamental settings: adversarial losses with full-information feedback and stochastic losses with bandit feedback. |
| Researcher Affiliation | Collaboration | Asaf Cassel Tel Aviv University EMAIL Aviv Rosenberg Google Research EMAIL |
| Pseudocode | Yes | Algorithm 1 Contracted Features PO for linear MDPs |
| Open Source Code | No | The paper does not contain any explicit statement about releasing open-source code for the described methodology, nor does it provide a link to a code repository. |
| Open Datasets | No | This paper focuses on theoretical contributions and does not involve empirical studies with specific datasets. |
| Dataset Splits | No | This paper is theoretical and does not describe any experimental validation process involving dataset splits. |
| Hardware Specification | No | The paper is theoretical and does not describe any experimental setup or the specific hardware used to run experiments. |
| Software Dependencies | No | The paper is theoretical and does not mention any specific software dependencies with version numbers for experimental reproducibility. |
| Experiment Setup | No | The paper is theoretical and does not detail an empirical experimental setup with hyperparameters or training configurations. |