Horizon-Free Regret for Linear Markov Decision Processes
Authors: Zihan Zhang, Jason D. Lee, Yuxin Chen, Simon Shaolei Du
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We give the first horizon-free bound for the popular linear MDP setting where the size of the transition model can be exponentially large or even uncountable. In contrast to prior works which explicitly estimate the transition model and compute the inhomogeneous value functions at different time steps, we directly estimate the value functions and confidence sets. We obtain the horizon-free bound by: (1) maintaining multiple weighted least square estimators for the value functions; and (2) a structural lemma which shows the maximal total variation of the inhomogeneous value functions is bounded by a polynomial factor of the feature dimension. |
| Researcher Affiliation | Academia | Department of Electrical and Computer Engineering, Princeton University; email: {zz5478,jasonlee}@princeton.edu. Department of Statistics and Data Science, University of Pennsylvania; email: yuxinc@wharton.upenn.edu. Paul G. Allen School of Computer Science and Engineering, University of Washington; email: ssdu@cs.washington.edu. |
| Pseudocode | Yes | Algorithm 1 Main Algorithm; Algorithm 2 HF Estimator |
| Open Source Code | No | The paper does not provide any concrete access to source code, nor does it explicitly state that source code will be released or is available in supplementary materials. |
| Open Datasets | No | The paper is theoretical and does not conduct empirical studies with datasets. Therefore, it does not provide access information for a publicly available or open dataset for training. |
| Dataset Splits | No | The paper is theoretical and does not conduct empirical studies with datasets. Therefore, it does not provide specific dataset split information for training, validation, or testing. |
| Hardware Specification | No | The paper does not provide any specific hardware details used for running its experiments, as it focuses on theoretical contributions. |
| Software Dependencies | No | The paper does not provide specific ancillary software details, such as library or solver names with version numbers, needed to replicate experiments. |
| Experiment Setup | No | The paper is theoretical and does not conduct empirical experiments, thus it does not include specific experimental setup details like hyperparameter values or training configurations. |