RL in Latent MDPs is Tractable: Online Guarantees via Off-Policy Evaluation
Authors: Jeongyeol Kwon, Shie Mannor, Constantine Caramanis, Yonathan Efroni
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | Our result builds off a new perspective on the role of offpolicy evaluation guarantees and coverage coefficients in LMDPs, a perspective, that has been overlooked in the context of exploration in partially observed environments. Specifically, we establish a novel off-policy evaluation lemma and introduce a new coverage coefficient for LMDPs. Then, we show how these can be used to derive near-optimal guarantees of an optimistic exploration algorithm. |
| Researcher Affiliation | Collaboration | Jeongyeol Kwon University of Wisconsin-Madison jeongyeol.kwon@wisc.edu Shie Mannor Technion / NVIDIA AI shie@ee.technion.ac.il Constantine Caramanis University of Texas at Austin constantine@utexas.edu Yonathan Efroni Meta AI jonathan.efroni@gmail.com |
| Pseudocode | Yes | Algorithm 1 MDP-OMLE ... Algorithm 2 LMDP-OMLE |
| Open Source Code | No | The paper does not contain any explicit statement about releasing source code or provide a link to a code repository. |
| Open Datasets | No | This is a theoretical paper focused on algorithms and proofs, not empirical evaluation. Therefore, it does not mention or use any publicly available datasets. |
| Dataset Splits | No | This is a theoretical paper focused on algorithms and proofs, not empirical evaluation. Therefore, it does not specify any dataset splits for training, validation, or testing. |
| Hardware Specification | No | This is a theoretical paper focused on algorithms and proofs. It does not describe any experiments that would require specific hardware, and thus no hardware specifications are provided. |
| Software Dependencies | No | This is a theoretical paper focused on algorithms and proofs. It does not describe any experiments or implementations that would require specific software dependencies with version numbers. |
| Experiment Setup | No | This is a theoretical paper focused on algorithms and proofs. It does not describe any experiments that would require specific setup details like hyperparameters or training configurations. |