Off-Policy Evaluation and Learning for External Validity under a Covariate Shift

Authors: Masatoshi Uehara, Masahiro Kato, Shota Yasui

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, we conduct experiments to confirm the effectiveness of the proposed estimators. In this section, we demonstrate the effectiveness of the proposed estimators using data obtained with bandit feedback. Following previous work (Dudík et al., 2011; Farajtabar et al., 2018), we evaluate the proposed estimators using the standard classification datasets from the UCI repository by transforming the classification data into contextual bandit data. From the UCI repository, we use the satimage, vehicle, and pendigits datasets.
Researcher Affiliation Collaboration Masatoshi Uehara1 , Masahiro Kato2 , Shota Yasui2 1 Cornell University mu223@cornell.edu 2Cyber Agent Inc. masahiro_kato@cyberagent.co.jp yasui_shota@cyberagent.co.jp
Pseudocode Yes Algorithm 1 Doubly Robust Estimator under a Covariate Shift
Open Source Code No The paper does not contain any statements about making its source code publicly available, nor does it provide a link to a code repository.
Open Datasets Yes From the UCI repository, we use the satimage, vehicle, and pendigits datasets. https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/multiclass.html
Dataset Splits Yes By adjusting Cprob, we classify 70% samples as the historical data and 30% samples as the evaluation data. For this estimator, we use 2-fold cross-fitting.
Hardware Specification No The paper does not provide any specific hardware details such as GPU or CPU models used for the experiments.
Software Dependencies No The paper mentions statistical methods and tools like 'kernel Ridge regression', 'Ku LISF', and 'Nadaraya-Watson regression' but does not specify any software names with version numbers for implementation.
Experiment Setup No For DRCS, we use 2-fold cross-fitting and add a regularization term. More details, such as the description of the data and choice of hyperparameters, are in Appendix H. The main text does not contain specific hyperparameter values.