reproducibilityindex.ai

Horizon-Free Regret for Linear Markov Decision Processes

Authors: Zihan Zhang, Jason D. Lee, Yuxin Chen, Simon Shaolei Du

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	We give the ﬁrst horizon-free bound for the popular linear MDP setting where the size of the transition model can be exponentially large or even uncountable. In contrast to prior works which explicitly estimate the transition model and compute the inhomogeneous value functions at different time steps, we directly estimate the value functions and conﬁdence sets. We obtain the horizon-free bound by: (1) maintaining multiple weighted least square estimators for the value functions; and (2) a structural lemma which shows the maximal total variation of the inhomogeneous value functions is bounded by a polynomial factor of the feature dimension.
Researcher Affiliation	Academia	Department of Electrical and Computer Engineering, Princeton University; email: {zz5478,jasonlee}@princeton.edu. Department of Statistics and Data Science, University of Pennsylvania; email: yuxinc@wharton.upenn.edu. Paul G. Allen School of Computer Science and Engineering, University of Washington; email: ssdu@cs.washington.edu.
Pseudocode	Yes	Algorithm 1 Main Algorithm; Algorithm 2 HF Estimator
Open Source Code	No	The paper does not provide any concrete access to source code, nor does it explicitly state that source code will be released or is available in supplementary materials.
Open Datasets	No	The paper is theoretical and does not conduct empirical studies with datasets. Therefore, it does not provide access information for a publicly available or open dataset for training.
Dataset Splits	No	The paper is theoretical and does not conduct empirical studies with datasets. Therefore, it does not provide specific dataset split information for training, validation, or testing.
Hardware Specification	No	The paper does not provide any specific hardware details used for running its experiments, as it focuses on theoretical contributions.
Software Dependencies	No	The paper does not provide specific ancillary software details, such as library or solver names with version numbers, needed to replicate experiments.
Experiment Setup	No	The paper is theoretical and does not conduct empirical experiments, thus it does not include specific experimental setup details like hyperparameter values or training configurations.