Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

RL in Latent MDPs is Tractable: Online Guarantees via Off-Policy Evaluation

Authors: Jeongyeol Kwon, Shie Mannor, Constantine Caramanis, Yonathan Efroni

NeurIPS 2024 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical Our result builds off a new perspective on the role of offpolicy evaluation guarantees and coverage coefficients in LMDPs, a perspective, that has been overlooked in the context of exploration in partially observed environments. Specifically, we establish a novel off-policy evaluation lemma and introduce a new coverage coefficient for LMDPs. Then, we show how these can be used to derive near-optimal guarantees of an optimistic exploration algorithm.
Researcher Affiliation Collaboration Jeongyeol Kwon University of Wisconsin-Madison EMAIL Shie Mannor Technion / NVIDIA AI EMAIL Constantine Caramanis University of Texas at Austin EMAIL Yonathan Efroni Meta AI EMAIL
Pseudocode Yes Algorithm 1 MDP-OMLE ... Algorithm 2 LMDP-OMLE
Open Source Code No The paper does not contain any explicit statement about releasing source code or provide a link to a code repository.
Open Datasets No This is a theoretical paper focused on algorithms and proofs, not empirical evaluation. Therefore, it does not mention or use any publicly available datasets.
Dataset Splits No This is a theoretical paper focused on algorithms and proofs, not empirical evaluation. Therefore, it does not specify any dataset splits for training, validation, or testing.
Hardware Specification No This is a theoretical paper focused on algorithms and proofs. It does not describe any experiments that would require specific hardware, and thus no hardware specifications are provided.
Software Dependencies No This is a theoretical paper focused on algorithms and proofs. It does not describe any experiments or implementations that would require specific software dependencies with version numbers.
Experiment Setup No This is a theoretical paper focused on algorithms and proofs. It does not describe any experiments that would require specific setup details like hyperparameters or training configurations.