reproducibilityindex.ai

Provable Reinforcement Learning with a Short-Term Memory

Authors: Yonathan Efroni, Chi Jin, Akshay Krishnamurthy, Sobhan Miryoosefi

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	We establish a set of upper and lower bounds on the sample complexity for learning near-optimal policies for this class of problems in both tabular and rich-observation settings (where the number of observations is enormous). In particular, in the rich-observation setting, we develop new algorithms using a novel moment matching approach with a sample complexity that scales exponentially with the short length m rather than the problem horizon, and is independent of the number of observations.
Researcher Affiliation	Collaboration	1Microsoft Research, New York 2Princeton University. Correspondence to: Sobhan Miryoosefi <miryoosefi@cs.princeton.edu>.
Pseudocode	Yes	Algorithm 1 m-GOLF: GOLF for m-step decodable POMDP
Open Source Code	No	The paper does not contain any statements about making its source code publicly available or providing a link to a code repository.
Open Datasets	No	This paper is purely theoretical and does not involve experimental training on a dataset.
Dataset Splits	No	This paper is purely theoretical and does not involve experimental validation splits.
Hardware Specification	No	This paper is purely theoretical and does not report on experiments, therefore no hardware specifications are mentioned.
Software Dependencies	No	This paper is purely theoretical, focusing on mathematical proofs and algorithm design, and does not mention any software dependencies with specific version numbers.
Experiment Setup	No	This paper is purely theoretical and does not include details on experimental setup, hyperparameters, or system-level training settings.