RL in Latent MDPs is Tractable: Online Guarantees via Off-Policy Evaluation

Authors: Jeongyeol Kwon, Shie Mannor, Constantine Caramanis, Yonathan Efroni

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical Our result builds off a new perspective on the role of offpolicy evaluation guarantees and coverage coefficients in LMDPs, a perspective, that has been overlooked in the context of exploration in partially observed environments. Specifically, we establish a novel off-policy evaluation lemma and introduce a new coverage coefficient for LMDPs. Then, we show how these can be used to derive near-optimal guarantees of an optimistic exploration algorithm.
Researcher Affiliation Collaboration Jeongyeol Kwon University of Wisconsin-Madison jeongyeol.kwon@wisc.edu Shie Mannor Technion / NVIDIA AI shie@ee.technion.ac.il Constantine Caramanis University of Texas at Austin constantine@utexas.edu Yonathan Efroni Meta AI jonathan.efroni@gmail.com
Pseudocode Yes Algorithm 1 MDP-OMLE ... Algorithm 2 LMDP-OMLE
Open Source Code No The paper does not contain any explicit statement about releasing source code or provide a link to a code repository.
Open Datasets No This is a theoretical paper focused on algorithms and proofs, not empirical evaluation. Therefore, it does not mention or use any publicly available datasets.
Dataset Splits No This is a theoretical paper focused on algorithms and proofs, not empirical evaluation. Therefore, it does not specify any dataset splits for training, validation, or testing.
Hardware Specification No This is a theoretical paper focused on algorithms and proofs. It does not describe any experiments that would require specific hardware, and thus no hardware specifications are provided.
Software Dependencies No This is a theoretical paper focused on algorithms and proofs. It does not describe any experiments or implementations that would require specific software dependencies with version numbers.
Experiment Setup No This is a theoretical paper focused on algorithms and proofs. It does not describe any experiments that would require specific setup details like hyperparameters or training configurations.