The Optimal Approximation Factors in Misspecified Off-Policy Value Function Estimation
Authors: Philip Amortila, Nan Jiang, Csaba Szepesvari
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | Theoretical guarantees in reinforcement learning (RL) are known to suffer multiplicative blow-up factors with respect to the misspecification error of function approximation. Yet, the nature of such approximation factors especially their optimal form in a given learning problem is poorly understood. In this paper we study this question in linear off-policy value function estimation, where many open questions remain. We study the approximation factor in a broad spectrum of settings, such as presence vs. absence of state aliasing and full vs. partial coverage of the state space. Our core results include instance-dependent upper bounds on the approximation factors with respect to both the weighted L2-norm (where the weighting is the offline state distribution) and the L norm. We show that these approximation factors are optimal (in an instance-dependent sense) for a number of these settings. In other cases, we show that the instance-dependent parameters which appear in the upper bounds are necessary, and that the finiteness of either alone cannot guarantee a finite approximation factor even in the limit of infinite data. |
| Researcher Affiliation | Academia | 1University of Illinois, Urbana-Champaign 2University of Alberta. Correspondence to: Philip Amortila <philipa4@illinois.edu>. |
| Pseudocode | No | The paper does not contain any pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any statement or link regarding the availability of open-source code for the described methodology. |
| Open Datasets | No | The paper is theoretical and does not conduct experiments with a dataset, thus no public dataset is used or made available. |
| Dataset Splits | No | The paper is theoretical and does not conduct experiments that would require dataset splits for training, validation, or testing. |
| Hardware Specification | No | The paper is theoretical and does not describe any hardware used for experiments. |
| Software Dependencies | No | The paper is theoretical and does not describe any software dependencies with specific version numbers. |
| Experiment Setup | No | The paper is theoretical and does not describe an experimental setup with hyperparameters or system-level training settings. |