reproducibilityindex.ai

The Optimal Approximation Factors in Misspecified Off-Policy Value Function Estimation

Authors: Philip Amortila, Nan Jiang, Csaba Szepesvari

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	Theoretical guarantees in reinforcement learning (RL) are known to suffer multiplicative blow-up factors with respect to the misspecification error of function approximation. Yet, the nature of such approximation factors especially their optimal form in a given learning problem is poorly understood. In this paper we study this question in linear off-policy value function estimation, where many open questions remain. We study the approximation factor in a broad spectrum of settings, such as presence vs. absence of state aliasing and full vs. partial coverage of the state space. Our core results include instance-dependent upper bounds on the approximation factors with respect to both the weighted L2-norm (where the weighting is the offline state distribution) and the L norm. We show that these approximation factors are optimal (in an instance-dependent sense) for a number of these settings. In other cases, we show that the instance-dependent parameters which appear in the upper bounds are necessary, and that the finiteness of either alone cannot guarantee a finite approximation factor even in the limit of infinite data.
Researcher Affiliation	Academia	1University of Illinois, Urbana-Champaign 2University of Alberta. Correspondence to: Philip Amortila <philipa4@illinois.edu>.
Pseudocode	No	The paper does not contain any pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide any statement or link regarding the availability of open-source code for the described methodology.
Open Datasets	No	The paper is theoretical and does not conduct experiments with a dataset, thus no public dataset is used or made available.
Dataset Splits	No	The paper is theoretical and does not conduct experiments that would require dataset splits for training, validation, or testing.
Hardware Specification	No	The paper is theoretical and does not describe any hardware used for experiments.
Software Dependencies	No	The paper is theoretical and does not describe any software dependencies with specific version numbers.
Experiment Setup	No	The paper is theoretical and does not describe an experimental setup with hyperparameters or system-level training settings.