Provable Reinforcement Learning with a Short-Term Memory

Authors: Yonathan Efroni, Chi Jin, Akshay Krishnamurthy, Sobhan Miryoosefi

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical We establish a set of upper and lower bounds on the sample complexity for learning near-optimal policies for this class of problems in both tabular and rich-observation settings (where the number of observations is enormous). In particular, in the rich-observation setting, we develop new algorithms using a novel moment matching approach with a sample complexity that scales exponentially with the short length m rather than the problem horizon, and is independent of the number of observations.
Researcher Affiliation Collaboration 1Microsoft Research, New York 2Princeton University. Correspondence to: Sobhan Miryoosefi <miryoosefi@cs.princeton.edu>.
Pseudocode Yes Algorithm 1 m-GOLF: GOLF for m-step decodable POMDP
Open Source Code No The paper does not contain any statements about making its source code publicly available or providing a link to a code repository.
Open Datasets No This paper is purely theoretical and does not involve experimental training on a dataset.
Dataset Splits No This paper is purely theoretical and does not involve experimental validation splits.
Hardware Specification No This paper is purely theoretical and does not report on experiments, therefore no hardware specifications are mentioned.
Software Dependencies No This paper is purely theoretical, focusing on mathematical proofs and algorithm design, and does not mention any software dependencies with specific version numbers.
Experiment Setup No This paper is purely theoretical and does not include details on experimental setup, hyperparameters, or system-level training settings.