Reward-Mixing MDPs with Few Latent Contexts are Learnable
Authors: Jeongyeol Kwon, Yonathan Efroni, Constantine Caramanis, Shie Mannor
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | In this work, we resolve several open questions for the general RMMDP setting. We consider an arbitrary M ≥ 2 and provide a sample-efficient algorithm EM2 that outputs an ϵ-optimal policy using O(ϵ−2SdAd poly(H, Z)d episodes... We also provide a (SA)Ω(M)/ϵ2 lower bound, supporting that super-polynomial sample complexity in M is necessary. |
| Researcher Affiliation | Collaboration | 1Wisconsin Institute for Discovery, University of Wisconsin Madison, USA 2Meta, New York 3Department of Electrical and Computer Engineering, University of Texas at Austin, USA 4Technion, Israel 5Nvidia Research. |
| Pseudocode | Yes | Algorithm 1 Estimate and Match Moments (EM2) ... Algorithm 2 |
| Open Source Code | No | The paper does not provide concrete access to source code for the methodology described. |
| Open Datasets | No | The paper is theoretical and does not use or provide access to any specific datasets for training. |
| Dataset Splits | No | The paper is theoretical and does not describe experimental validation procedures with dataset splits. |
| Hardware Specification | No | The paper is theoretical and does not mention any specific hardware used for experiments. |
| Software Dependencies | No | The paper is theoretical and does not mention specific software dependencies with version numbers. |
| Experiment Setup | No | The paper is theoretical and does not include details about an experimental setup with hyperparameters or training configurations. |