Optimal Off-Policy Evaluation from Multiple Logging Policies

Authors: Nathan Kallus, Yuta Saito, Masatoshi Uehara

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments demonstrate the benefits of our methods efficiently leveraging of the stratified sampling of off-policy data from multiple loggers.
Researcher Affiliation Academia 1Cornell University, NY, USA . Correspondence to: Masatoshi Uehara <mu223@cornell.edu>
Pseudocode Yes Algorithm 1 Feasible Cross-Fold Version of Γ(D; h, g)
Open Source Code No The paper does not contain any explicit statement about releasing source code or provide a link to a code repository.
Open Datasets Yes We evaluate our estimators using multiclass classification datasets from the UCI repository. Here we consider the optdigits and pendigits datasets (see Table 3 in Appendix E.).
Dataset Splits Yes We split the original data into training (30%) and evaluation (70%) sets.
Hardware Specification No The paper does not explicitly mention the specific hardware used to run the experiments (e.g., GPU/CPU models, cloud instances).
Software Dependencies No The paper mentions "We use tensorflow" but does not provide any version numbers for TensorFlow or any other software dependencies.
Experiment Setup Yes We split the original data into training (30%) and evaluation (70%) sets. ... We vary ρ1/(1 ρ1) = n1/n2 in {0.1, 0.25, 0.5, 1, 2, 4, 10}. ... We repeat the process M = 200 times with different random seeds ... For all estimators, we estimate the logging policies using logistic regression on the evaluation set with 2-fold cross-fitting as in Algorithm 1. ... For DR, DR-Avg, and DR-PW, we construct q-estimates using logistic regression again using 2-fold cross-fitting as in Algorithm 1.