Off-Policy Evaluation with Deficient Support Using Side Information

Authors: Nicolò Felicioni, Maurizio Ferrari Dacrema, Marcello Restelli, Paolo Cremonesi

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this paper, we consider two alternative estimators for the deficient support OPE problem. We first show how to adapt an estimator that was originally proposed for a different domain to the deficient support setting. Then, we propose another estimator, which is a novel contribution of this paper. These estimators exploit additional information about the actions, which we call side information, in order to make reliable estimates on the unsupported actions. Under alternative assumptions that do not require full support, we show that the considered estimators are unbiased. We also provide a theoretical analysis of the concentration when relaxing all the assumptions. Finally, we provide an experimental evaluation showing how the considered estimators are better suited for the deficient support setting compared to the baselines.
Researcher Affiliation Academia Nicolò Felicioni Politecnico di Milano nicolo.felicioni@polimi.it Maurizio Ferrari Dacrema Politecnico di Milano maurizio.ferrari@polimi.it Marcello Restelli Politecnico di Milano marcello.restelli@polimi.it Paolo Cremonesi Politecnico di Milano paolo.cremonesi@polimi.it
Pseudocode Yes This pre-processing protocol is summarized in Algorithm 1, presented in Appendix B.
Open Source Code Yes The code used for the experiments can be found at https://github.com/recsyspolimi/neurips-2022-ope-side-info.
Open Datasets Yes The dataset that we use is the Open Bandit Dataset (OBD), released with Open Bandit Pipeline. OBD contains logged bandit feedback from a real-world application (a large-scale fashion e-commerce platform). There are three campaigns available, namely "ALL", "Men", and "Women". We select the "ALL" campaign.
Dataset Splits No The paper describes using a random sub-sample from a logging dataset and performing bootstrap evaluation with random seeds. While it refers to data processing and evaluation, it does not specify explicit train, validation, or test dataset splits in terms of percentages or sample counts for model training or selection.
Hardware Specification Yes All experiments were run on a server with an Intel(R) Core(TM) i7-9700K CPU @ 3.60GHz, 64 GB RAM, and a NVIDIA GeForce RTX 2080 Ti GPU (11 GB GDDR6 RAM).
Software Dependencies No The paper mentions using "two Python packages: Open Bandit Pipeline [49] and Py IEOE [50]" and states that Light GBM was used as a regression model. However, it does not provide specific version numbers for any of these software components (Python, Open Bandit Pipeline, Py IEOE, or Light GBM).
Experiment Setup Yes The second proposal is to create a clustering of the actions, which induces a partition of A. This is done by applying K-Means Clustering (we set k = 30) applied on the normalized action feature vectors f(a)/kf(a)k2.