The Optimal Approximation Factors in Misspecified Off-Policy Value Function Estimation

Authors: Philip Amortila, Nan Jiang, Csaba Szepesvari

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical Theoretical guarantees in reinforcement learning (RL) are known to suffer multiplicative blow-up factors with respect to the misspecification error of function approximation. Yet, the nature of such approximation factors especially their optimal form in a given learning problem is poorly understood. In this paper we study this question in linear off-policy value function estimation, where many open questions remain. We study the approximation factor in a broad spectrum of settings, such as presence vs. absence of state aliasing and full vs. partial coverage of the state space. Our core results include instance-dependent upper bounds on the approximation factors with respect to both the weighted L2-norm (where the weighting is the offline state distribution) and the L norm. We show that these approximation factors are optimal (in an instance-dependent sense) for a number of these settings. In other cases, we show that the instance-dependent parameters which appear in the upper bounds are necessary, and that the finiteness of either alone cannot guarantee a finite approximation factor even in the limit of infinite data.
Researcher Affiliation Academia 1University of Illinois, Urbana-Champaign 2University of Alberta. Correspondence to: Philip Amortila <philipa4@illinois.edu>.
Pseudocode No The paper does not contain any pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any statement or link regarding the availability of open-source code for the described methodology.
Open Datasets No The paper is theoretical and does not conduct experiments with a dataset, thus no public dataset is used or made available.
Dataset Splits No The paper is theoretical and does not conduct experiments that would require dataset splits for training, validation, or testing.
Hardware Specification No The paper is theoretical and does not describe any hardware used for experiments.
Software Dependencies No The paper is theoretical and does not describe any software dependencies with specific version numbers.
Experiment Setup No The paper is theoretical and does not describe an experimental setup with hyperparameters or system-level training settings.