reproducibilityindex.ai

Off-Policy Evaluation with Deficient Support Using Side Information

Authors: Nicolò Felicioni, Maurizio Ferrari Dacrema, Marcello Restelli, Paolo Cremonesi

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this paper, we consider two alternative estimators for the deﬁcient support OPE problem. We ﬁrst show how to adapt an estimator that was originally proposed for a different domain to the deﬁcient support setting. Then, we propose another estimator, which is a novel contribution of this paper. These estimators exploit additional information about the actions, which we call side information, in order to make reliable estimates on the unsupported actions. Under alternative assumptions that do not require full support, we show that the considered estimators are unbiased. We also provide a theoretical analysis of the concentration when relaxing all the assumptions. Finally, we provide an experimental evaluation showing how the considered estimators are better suited for the deﬁcient support setting compared to the baselines.
Researcher Affiliation	Academia	Nicolò Felicioni Politecnico di Milano nicolo.felicioni@polimi.it Maurizio Ferrari Dacrema Politecnico di Milano maurizio.ferrari@polimi.it Marcello Restelli Politecnico di Milano marcello.restelli@polimi.it Paolo Cremonesi Politecnico di Milano paolo.cremonesi@polimi.it
Pseudocode	Yes	This pre-processing protocol is summarized in Algorithm 1, presented in Appendix B.
Open Source Code	Yes	The code used for the experiments can be found at https://github.com/recsyspolimi/neurips-2022-ope-side-info.
Open Datasets	Yes	The dataset that we use is the Open Bandit Dataset (OBD), released with Open Bandit Pipeline. OBD contains logged bandit feedback from a real-world application (a large-scale fashion e-commerce platform). There are three campaigns available, namely "ALL", "Men", and "Women". We select the "ALL" campaign.
Dataset Splits	No	The paper describes using a random sub-sample from a logging dataset and performing bootstrap evaluation with random seeds. While it refers to data processing and evaluation, it does not specify explicit train, validation, or test dataset splits in terms of percentages or sample counts for model training or selection.
Hardware Specification	Yes	All experiments were run on a server with an Intel(R) Core(TM) i7-9700K CPU @ 3.60GHz, 64 GB RAM, and a NVIDIA GeForce RTX 2080 Ti GPU (11 GB GDDR6 RAM).
Software Dependencies	No	The paper mentions using "two Python packages: Open Bandit Pipeline [49] and Py IEOE [50]" and states that Light GBM was used as a regression model. However, it does not provide specific version numbers for any of these software components (Python, Open Bandit Pipeline, Py IEOE, or Light GBM).
Experiment Setup	Yes	The second proposal is to create a clustering of the actions, which induces a partition of A. This is done by applying K-Means Clustering (we set k = 30) applied on the normalized action feature vectors f(a)/kf(a)k2.