reproducibilityindex.ai

Optimal Off-Policy Evaluation from Multiple Logging Policies

Authors: Nathan Kallus, Yuta Saito, Masatoshi Uehara

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments demonstrate the beneﬁts of our methods efﬁciently leveraging of the stratiﬁed sampling of off-policy data from multiple loggers.
Researcher Affiliation	Academia	1Cornell University, NY, USA . Correspondence to: Masatoshi Uehara <mu223@cornell.edu>
Pseudocode	Yes	Algorithm 1 Feasible Cross-Fold Version of Γ(D; h, g)
Open Source Code	No	The paper does not contain any explicit statement about releasing source code or provide a link to a code repository.
Open Datasets	Yes	We evaluate our estimators using multiclass classiﬁcation datasets from the UCI repository. Here we consider the optdigits and pendigits datasets (see Table 3 in Appendix E.).
Dataset Splits	Yes	We split the original data into training (30%) and evaluation (70%) sets.
Hardware Specification	No	The paper does not explicitly mention the specific hardware used to run the experiments (e.g., GPU/CPU models, cloud instances).
Software Dependencies	No	The paper mentions "We use tensorﬂow" but does not provide any version numbers for TensorFlow or any other software dependencies.
Experiment Setup	Yes	We split the original data into training (30%) and evaluation (70%) sets. ... We vary ρ1/(1 ρ1) = n1/n2 in {0.1, 0.25, 0.5, 1, 2, 4, 10}. ... We repeat the process M = 200 times with different random seeds ... For all estimators, we estimate the logging policies using logistic regression on the evaluation set with 2-fold cross-ﬁtting as in Algorithm 1. ... For DR, DR-Avg, and DR-PW, we construct q-estimates using logistic regression again using 2-fold cross-ﬁtting as in Algorithm 1.