reproducibilityindex.ai

On the Design of Estimators for Bandit Off-Policy Evaluation

Authors: Nikos Vlassis, Aurelien Bibaut, Maria Dimakopoulou, Tony Jebara

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We present our main results in the context of multi-armed bandits, and we decribe a simple design for contextual bandits that gives rise to an estimator that is shown to perform well in multi-class cost-sensitive classiﬁcation datasets. We use the same 9 benchmark datasets from the UCI repository (Dua & Graff, 2017; Asuncion & Newman, 2007) as in Dud ık et al. (2014). In Table 1 we report the RMSE of the different estimators for each benchmark.
Researcher Affiliation	Collaboration	1Netﬂix, Los Gatos CA, USA 2Department of Biostatistics, University of California Berkeley, Berkeley, USA.
Pseudocode	No	No structured pseudocode or algorithm blocks were found.
Open Source Code	No	The code for all experiments is available by request from the authors.
Open Datasets	Yes	We use the same 9 benchmark datasets from the UCI repository (Dua & Graff, 2017; Asuncion & Newman, 2007) as in Dud ık et al. (2014).
Dataset Splits	No	The paper mentions splitting data into training and test sets but does not provide specific details on a separate validation split or the exact percentages/counts for these splits.
Hardware Specification	No	No specific hardware details (e.g., CPU/GPU models, memory) used for experiments are provided.
Software Dependencies	No	No specific software dependencies with version numbers are mentioned (e.g., library names with their versions).
Experiment Setup	Yes	For the evaluation on a dataset, we follow the methodology of Dud ık et al. (2014). We randomly split data into training and test sets of the same size. We run logistic regression to obtain a classiﬁer π logging policy µ selects label π(x) with probability ϵ = 0.05 and with probability 1 ϵ the logging policy µ selects one of the other labels {1, 2, . . . , K}\π(x) uniformly at random. we use a linear loss model ˆr(x, a) = wa x parameterized by K weight vectors {wa}a 1,...,K and use least-squares regression to ﬁt wa based on a partially labeled dataset from the training set. For each dataset, we repeat step 4, N = 500 times.