Off-Policy Evaluation and Learning for External Validity under a Covariate Shift
Authors: Masatoshi Uehara, Masahiro Kato, Shota Yasui
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we conduct experiments to confirm the effectiveness of the proposed estimators. In this section, we demonstrate the effectiveness of the proposed estimators using data obtained with bandit feedback. Following previous work (Dudík et al., 2011; Farajtabar et al., 2018), we evaluate the proposed estimators using the standard classification datasets from the UCI repository by transforming the classification data into contextual bandit data. From the UCI repository, we use the satimage, vehicle, and pendigits datasets. |
| Researcher Affiliation | Collaboration | Masatoshi Uehara1 , Masahiro Kato2 , Shota Yasui2 1 Cornell University mu223@cornell.edu 2Cyber Agent Inc. masahiro_kato@cyberagent.co.jp yasui_shota@cyberagent.co.jp |
| Pseudocode | Yes | Algorithm 1 Doubly Robust Estimator under a Covariate Shift |
| Open Source Code | No | The paper does not contain any statements about making its source code publicly available, nor does it provide a link to a code repository. |
| Open Datasets | Yes | From the UCI repository, we use the satimage, vehicle, and pendigits datasets. https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/multiclass.html |
| Dataset Splits | Yes | By adjusting Cprob, we classify 70% samples as the historical data and 30% samples as the evaluation data. For this estimator, we use 2-fold cross-fitting. |
| Hardware Specification | No | The paper does not provide any specific hardware details such as GPU or CPU models used for the experiments. |
| Software Dependencies | No | The paper mentions statistical methods and tools like 'kernel Ridge regression', 'Ku LISF', and 'Nadaraya-Watson regression' but does not specify any software names with version numbers for implementation. |
| Experiment Setup | No | For DRCS, we use 2-fold cross-fitting and add a regularization term. More details, such as the description of the data and choice of hyperparameters, are in Appendix H. The main text does not contain specific hyperparameter values. |