Lower Bounds for Learning in Revealing POMDPs

Authors: Fan Chen, Huan Wang, Caiming Xiong, Song Mei, Yu Bai

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical We establish strong PAC and regret lower bounds for learning in revealing POMDPs. Our lower bounds scale polynomially in all relevant problem parameters in a multiplicative fashion, and achieve significantly smaller gaps against the current best upper bounds, providing a solid starting point for future studies. In particular, for multistep revealing POMDPs, we show that (1) the latent state-space dependence is at least Ωp S1.5q in the PAC sample complexity, which is notably harder than the rΘp Sq scaling for fully-observable MDPs; (2) Any polynomial sublinear regret is at least Ωp T 2{3q, suggesting its fundamental difference from the single-step case where r Op ? Tq regret is achievable. Technically, our hard instance construction adapts techniques in distribution testing, which is new to the RL literature and may be of independent interest. We also complement our results with new sharp regret upper bounds for strongly B-stable PSRs, which include single-step revealing POMDPs as a special case.
Researcher Affiliation Collaboration 1Peking University 2Salesforce AI Research 3UC Berkeley. Correspondence to: Fan Chen <chern@pku.edu.cn>, Song Mei <songmei@berkeley.edu>, Yu Bai <yu.bai@salesforce.com>.
Pseudocode No The paper mentions 'Algorithm 1 OPTIMISTIC MAXIMUM LIKELIHOOD ESTIMATION (OMLE)' in Appendix H.2, but this is a previously existing algorithm cited from other works, not pseudocode developed by the authors for their own contributions.
Open Source Code No The paper does not provide any explicit statements about releasing source code or links to a code repository.
Open Datasets No The paper is theoretical in nature, focusing on establishing lower bounds and does not conduct experiments on datasets, thus no dataset is used or made public.
Dataset Splits No The paper is theoretical in nature, focusing on establishing lower bounds and does not conduct experiments on datasets, thus no dataset splits are provided.
Hardware Specification No The paper is theoretical and does not report on experiments that would require specific hardware, thus no hardware specifications are mentioned.
Software Dependencies No The paper is theoretical and does not report on experiments that would require specific software dependencies, thus none are mentioned with version numbers.
Experiment Setup No The paper is theoretical and focuses on lower bounds and proofs, not experimental implementations that would require details on hyperparameters or training settings.