Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Provable Reinforcement Learning with a Short-Term Memory
Authors: Yonathan Efroni, Chi Jin, Akshay Krishnamurthy, Sobhan Miryoosefi
ICML 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We establish a set of upper and lower bounds on the sample complexity for learning near-optimal policies for this class of problems in both tabular and rich-observation settings (where the number of observations is enormous). In particular, in the rich-observation setting, we develop new algorithms using a novel moment matching approach with a sample complexity that scales exponentially with the short length m rather than the problem horizon, and is independent of the number of observations. |
| Researcher Affiliation | Collaboration | 1Microsoft Research, New York 2Princeton University. Correspondence to: Sobhan Miryoosefi <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 m-GOLF: GOLF for m-step decodable POMDP |
| Open Source Code | No | The paper does not contain any statements about making its source code publicly available or providing a link to a code repository. |
| Open Datasets | No | This paper is purely theoretical and does not involve experimental training on a dataset. |
| Dataset Splits | No | This paper is purely theoretical and does not involve experimental validation splits. |
| Hardware Specification | No | This paper is purely theoretical and does not report on experiments, therefore no hardware specifications are mentioned. |
| Software Dependencies | No | This paper is purely theoretical, focusing on mathematical proofs and algorithm design, and does not mention any software dependencies with specific version numbers. |
| Experiment Setup | No | This paper is purely theoretical and does not include details on experimental setup, hyperparameters, or system-level training settings. |