What are the Statistical Limits of Offline RL with Linear Function Approximation?
Authors: Ruosong Wang, Dean Foster, Sham M. Kakade
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | Perhaps surprisingly, our main result shows that even if: i) we have realizability in that the true value function of every policy is linear in a given set of features and 2) our off-policy data has good coverage over all features (under a strong spectral condition), any algorithm still (information-theoretically) requires a number of offline samples that is exponential in the problem horizon to nontrivially estimate the value of any given policy. |
| Researcher Affiliation | Collaboration | Ruosong Wang Carnegie Mellon University ruosongw@andrew.cmu.edu Dean P. Foster University of Pennsylvania and Amazon dean@foster.net Sham M. Kakade University of Washington, Seattle and Microsoft Research sham@cs.washington.edu |
| Pseudocode | Yes | Algorithm 1 Least-Squares Policy Evaluation |
| Open Source Code | No | The paper does not provide any statement or link indicating the release of open-source code for the methodology described. |
| Open Datasets | No | The paper constructs theoretical |
| Dataset Splits | No | The paper is theoretical and does not involve empirical validation on datasets requiring explicit train/validation/test splits. |
| Hardware Specification | No | The paper is theoretical and does not describe any hardware used for experiments. |
| Software Dependencies | No | The paper is theoretical and does not describe specific software dependencies with version numbers. |
| Experiment Setup | No | The paper is theoretical and focuses on statistical limits and algorithm analysis, not on empirical experiment setup with hyperparameters. |