Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Lower Bounds for Learning in Revealing POMDPs
Authors: Fan Chen, Huan Wang, Caiming Xiong, Song Mei, Yu Bai
ICML 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We establish strong PAC and regret lower bounds for learning in revealing POMDPs. Our lower bounds scale polynomially in all relevant problem parameters in a multiplicative fashion, and achieve significantly smaller gaps against the current best upper bounds, providing a solid starting point for future studies. In particular, for multistep revealing POMDPs, we show that (1) the latent state-space dependence is at least ā¦p S1.5q in the PAC sample complexity, which is notably harder than the rĪp Sq scaling for fully-observable MDPs; (2) Any polynomial sublinear regret is at least ā¦p T 2{3q, suggesting its fundamental difference from the single-step case where r Op ? Tq regret is achievable. Technically, our hard instance construction adapts techniques in distribution testing, which is new to the RL literature and may be of independent interest. We also complement our results with new sharp regret upper bounds for strongly B-stable PSRs, which include single-step revealing POMDPs as a special case. |
| Researcher Affiliation | Collaboration | 1Peking University 2Salesforce AI Research 3UC Berkeley. Correspondence to: Fan Chen <EMAIL>, Song Mei <EMAIL>, Yu Bai <EMAIL>. |
| Pseudocode | No | The paper mentions 'Algorithm 1 OPTIMISTIC MAXIMUM LIKELIHOOD ESTIMATION (OMLE)' in Appendix H.2, but this is a previously existing algorithm cited from other works, not pseudocode developed by the authors for their own contributions. |
| Open Source Code | No | The paper does not provide any explicit statements about releasing source code or links to a code repository. |
| Open Datasets | No | The paper is theoretical in nature, focusing on establishing lower bounds and does not conduct experiments on datasets, thus no dataset is used or made public. |
| Dataset Splits | No | The paper is theoretical in nature, focusing on establishing lower bounds and does not conduct experiments on datasets, thus no dataset splits are provided. |
| Hardware Specification | No | The paper is theoretical and does not report on experiments that would require specific hardware, thus no hardware specifications are mentioned. |
| Software Dependencies | No | The paper is theoretical and does not report on experiments that would require specific software dependencies, thus none are mentioned with version numbers. |
| Experiment Setup | No | The paper is theoretical and focuses on lower bounds and proofs, not experimental implementations that would require details on hyperparameters or training settings. |