Provable Reinforcement Learning with a Short-Term Memory
Authors: Yonathan Efroni, Chi Jin, Akshay Krishnamurthy, Sobhan Miryoosefi
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We establish a set of upper and lower bounds on the sample complexity for learning near-optimal policies for this class of problems in both tabular and rich-observation settings (where the number of observations is enormous). In particular, in the rich-observation setting, we develop new algorithms using a novel moment matching approach with a sample complexity that scales exponentially with the short length m rather than the problem horizon, and is independent of the number of observations. |
| Researcher Affiliation | Collaboration | 1Microsoft Research, New York 2Princeton University. Correspondence to: Sobhan Miryoosefi <miryoosefi@cs.princeton.edu>. |
| Pseudocode | Yes | Algorithm 1 m-GOLF: GOLF for m-step decodable POMDP |
| Open Source Code | No | The paper does not contain any statements about making its source code publicly available or providing a link to a code repository. |
| Open Datasets | No | This paper is purely theoretical and does not involve experimental training on a dataset. |
| Dataset Splits | No | This paper is purely theoretical and does not involve experimental validation splits. |
| Hardware Specification | No | This paper is purely theoretical and does not report on experiments, therefore no hardware specifications are mentioned. |
| Software Dependencies | No | This paper is purely theoretical, focusing on mathematical proofs and algorithm design, and does not mention any software dependencies with specific version numbers. |
| Experiment Setup | No | This paper is purely theoretical and does not include details on experimental setup, hyperparameters, or system-level training settings. |