Optimal Off-Policy Evaluation from Multiple Logging Policies
Authors: Nathan Kallus, Yuta Saito, Masatoshi Uehara
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate the benefits of our methods efficiently leveraging of the stratified sampling of off-policy data from multiple loggers. |
| Researcher Affiliation | Academia | 1Cornell University, NY, USA . Correspondence to: Masatoshi Uehara <mu223@cornell.edu> |
| Pseudocode | Yes | Algorithm 1 Feasible Cross-Fold Version of Γ(D; h, g) |
| Open Source Code | No | The paper does not contain any explicit statement about releasing source code or provide a link to a code repository. |
| Open Datasets | Yes | We evaluate our estimators using multiclass classification datasets from the UCI repository. Here we consider the optdigits and pendigits datasets (see Table 3 in Appendix E.). |
| Dataset Splits | Yes | We split the original data into training (30%) and evaluation (70%) sets. |
| Hardware Specification | No | The paper does not explicitly mention the specific hardware used to run the experiments (e.g., GPU/CPU models, cloud instances). |
| Software Dependencies | No | The paper mentions "We use tensorflow" but does not provide any version numbers for TensorFlow or any other software dependencies. |
| Experiment Setup | Yes | We split the original data into training (30%) and evaluation (70%) sets. ... We vary ρ1/(1 ρ1) = n1/n2 in {0.1, 0.25, 0.5, 1, 2, 4, 10}. ... We repeat the process M = 200 times with different random seeds ... For all estimators, we estimate the logging policies using logistic regression on the evaluation set with 2-fold cross-fitting as in Algorithm 1. ... For DR, DR-Avg, and DR-PW, we construct q-estimates using logistic regression again using 2-fold cross-fitting as in Algorithm 1. |