Is a Good Representation Sufficient for Sample Efficient Reinforcement Learning?

Authors: Simon S. Du, Sham M. Kakade, Ruosong Wang, Lin F. Yang

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical This paper gives, perhaps surprisingly, strong negative results to this question. The main results are exponential lower bounds in terms of planning horizon H for value-based, model-based, and policy-based algorithms with given good representations3. Notably, the requirements on the representation that suffice for sample efficient RL are even more stringent than the more traditional approximation viewpoint. A comprehensive summary of previous upper bounds and our lower bounds is given in Table 1, and here we briefly summarize our hardness results.
Researcher Affiliation Academia Simon S. Du Institute for Advanced Study ssdu@ias.edu Sham M. Kakade University of Washington, Seattle sham@cs.washington.edu Ruosong Wang Carnegie Mellon University ruosongw@andrew.cmu.edu Lin F. Yang University of California, Los Angles linyang@ee.ucla.edu
Pseudocode No The paper describes algorithms conceptually, for instance in Section C.1 "The algorithm learns..." and "Now we present a procedure...", but it does not include any formally structured pseudocode blocks or algorithm listings.
Open Source Code No The paper does not contain any statements or links indicating that source code for the described methodology is publicly available.
Open Datasets No The paper focuses on theoretical lower bounds and does not describe experiments using publicly available datasets for training. The discussion on 'trajectories' refers to theoretical constructs within MDPs rather than empirical data.
Dataset Splits No The paper is theoretical and does not describe empirical validation using dataset splits.
Hardware Specification No As a theoretical paper, it does not mention any specific hardware specifications used for running experiments.
Software Dependencies No As a theoretical paper, it does not list any specific software dependencies with version numbers.
Experiment Setup No As a theoretical paper, it does not describe any experimental setup details such as hyperparameters or system-level training settings.