reproducibilityindex.ai

Reward-Mixing MDPs with Few Latent Contexts are Learnable

Authors: Jeongyeol Kwon, Yonathan Efroni, Constantine Caramanis, Shie Mannor

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	In this work, we resolve several open questions for the general RMMDP setting. We consider an arbitrary M ≥ 2 and provide a sample-efficient algorithm EM2 that outputs an ϵ-optimal policy using O(ϵ−2SdAd poly(H, Z)d episodes... We also provide a (SA)Ω(M)/ϵ2 lower bound, supporting that super-polynomial sample complexity in M is necessary.
Researcher Affiliation	Collaboration	1Wisconsin Institute for Discovery, University of Wisconsin Madison, USA 2Meta, New York 3Department of Electrical and Computer Engineering, University of Texas at Austin, USA 4Technion, Israel 5Nvidia Research.
Pseudocode	Yes	Algorithm 1 Estimate and Match Moments (EM2) ... Algorithm 2
Open Source Code	No	The paper does not provide concrete access to source code for the methodology described.
Open Datasets	No	The paper is theoretical and does not use or provide access to any specific datasets for training.
Dataset Splits	No	The paper is theoretical and does not describe experimental validation procedures with dataset splits.
Hardware Specification	No	The paper is theoretical and does not mention any specific hardware used for experiments.
Software Dependencies	No	The paper is theoretical and does not mention specific software dependencies with version numbers.
Experiment Setup	No	The paper is theoretical and does not include details about an experimental setup with hyperparameters or training configurations.