Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
RL in Latent MDPs is Tractable: Online Guarantees via Off-Policy Evaluation
Authors: Jeongyeol Kwon, Shie Mannor, Constantine Caramanis, Yonathan Efroni
NeurIPS 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | Our result builds off a new perspective on the role of offpolicy evaluation guarantees and coverage coefficients in LMDPs, a perspective, that has been overlooked in the context of exploration in partially observed environments. Specifically, we establish a novel off-policy evaluation lemma and introduce a new coverage coefficient for LMDPs. Then, we show how these can be used to derive near-optimal guarantees of an optimistic exploration algorithm. |
| Researcher Affiliation | Collaboration | Jeongyeol Kwon University of Wisconsin-Madison EMAIL Shie Mannor Technion / NVIDIA AI EMAIL Constantine Caramanis University of Texas at Austin EMAIL Yonathan Efroni Meta AI EMAIL |
| Pseudocode | Yes | Algorithm 1 MDP-OMLE ... Algorithm 2 LMDP-OMLE |
| Open Source Code | No | The paper does not contain any explicit statement about releasing source code or provide a link to a code repository. |
| Open Datasets | No | This is a theoretical paper focused on algorithms and proofs, not empirical evaluation. Therefore, it does not mention or use any publicly available datasets. |
| Dataset Splits | No | This is a theoretical paper focused on algorithms and proofs, not empirical evaluation. Therefore, it does not specify any dataset splits for training, validation, or testing. |
| Hardware Specification | No | This is a theoretical paper focused on algorithms and proofs. It does not describe any experiments that would require specific hardware, and thus no hardware specifications are provided. |
| Software Dependencies | No | This is a theoretical paper focused on algorithms and proofs. It does not describe any experiments or implementations that would require specific software dependencies with version numbers. |
| Experiment Setup | No | This is a theoretical paper focused on algorithms and proofs. It does not describe any experiments that would require specific setup details like hyperparameters or training configurations. |