reproducibilityindex.ai

Is a Good Representation Sufficient for Sample Efficient Reinforcement Learning?

Authors: Simon S. Du, Sham M. Kakade, Ruosong Wang, Lin F. Yang

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	This paper gives, perhaps surprisingly, strong negative results to this question. The main results are exponential lower bounds in terms of planning horizon H for value-based, model-based, and policy-based algorithms with given good representations3. Notably, the requirements on the representation that sufﬁce for sample efﬁcient RL are even more stringent than the more traditional approximation viewpoint. A comprehensive summary of previous upper bounds and our lower bounds is given in Table 1, and here we brieﬂy summarize our hardness results.
Researcher Affiliation	Academia	Simon S. Du Institute for Advanced Study ssdu@ias.edu Sham M. Kakade University of Washington, Seattle sham@cs.washington.edu Ruosong Wang Carnegie Mellon University ruosongw@andrew.cmu.edu Lin F. Yang University of California, Los Angles linyang@ee.ucla.edu
Pseudocode	No	The paper describes algorithms conceptually, for instance in Section C.1 "The algorithm learns..." and "Now we present a procedure...", but it does not include any formally structured pseudocode blocks or algorithm listings.
Open Source Code	No	The paper does not contain any statements or links indicating that source code for the described methodology is publicly available.
Open Datasets	No	The paper focuses on theoretical lower bounds and does not describe experiments using publicly available datasets for training. The discussion on 'trajectories' refers to theoretical constructs within MDPs rather than empirical data.
Dataset Splits	No	The paper is theoretical and does not describe empirical validation using dataset splits.
Hardware Specification	No	As a theoretical paper, it does not mention any specific hardware specifications used for running experiments.
Software Dependencies	No	As a theoretical paper, it does not list any specific software dependencies with version numbers.
Experiment Setup	No	As a theoretical paper, it does not describe any experimental setup details such as hyperparameters or system-level training settings.