Off-policy estimation with adaptively collected data: the power of online learning
Authors: Jeonghwan Lee, Cong Ma
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | While a certain class of augmented inverse propensity weighting (AIPW) estimators enjoys desirable asymptotic properties including the semiparametric efficiency, much less is known about their non-asymptotic theory with adaptively collected data. To fill in the gap, we first present generic upper bounds on the mean-squared error of the class of AIPW estimators that crucially depends on a sequentially weighted error between the treatment effect and its estimates. Motivated by this, we propose a general reduction scheme that allows one to produce a sequence of estimates for the treatment effect via online learning to minimize the sequentially weighted estimation error. To illustrate this, we provide three concrete instantiations in (1) the tabular case; (2) the case of linear function approximation; and (3) the case of general function approximation for the outcome model. We then provide a local minimax lower bound to show the instance-dependent optimality of the AIPW estimator using no-regret online learning algorithms. (From NeurIPS Checklist): We don’t have experimental results in this paper. |
| Researcher Affiliation | Academia | Jeonghwan Lee Department of Statistics The University of Chicago Chicago, IL 60637 jhlee97@uchicago.edu Cong Ma Department of Statistics The University of Chicago Chicago, IL 60637 congm@uchicago.edu |
| Pseudocode | Yes | Algorithm 1 Meta-algorithm: augmented inverse propensity weighting (AIPW) estimator. Algorithm 2 Online non-parametric regression protocol for estimation of the treatment effect. Algorithm 3 Online gradient descent (OGD) algorithm for the finite state-action space. Algorithm 4 Online gradient descent (OGD) algorithm for linear function approximation. Algorithm 5 A generic forecaster based on the relaxation recipe proposed in [47] |
| Open Source Code | No | We don’t have experimental results in this paper. |
| Open Datasets | No | We don’t have experimental results in this paper. |
| Dataset Splits | No | We don’t have experimental results in this paper. |
| Hardware Specification | No | We don’t have experimental results in this paper. |
| Software Dependencies | No | We don’t have experimental results in this paper. |
| Experiment Setup | No | We don’t have experimental results in this paper. |