Policy-Adaptive Estimator Selection for Off-Policy Evaluation

Authors: Takuma Udagawa, Haruka Kiyohara, Yusuke Narita, Yuta Saito, Kei Tateno

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Comprehensive experiments on both synthetic and real-world company data demonstrate that the proposed procedure substantially improves the estimator selection compared to a non-adaptive heuristic.
Researcher Affiliation Collaboration 1Sony Group Corporation 2Tokyo Institute of Technology 3Yale University 4Cornell University Takuma.Udagawa@sony.com, kiyohara.h.aa@m.titech.ac.jp, yusuke.narita@yale.edu, ys552@cornell.edu, Kei.Tateno@sony.com
Pseudocode Yes Algorithm 1: Policy-Adaptive Estimator Selection via Importance Fitting (PAS-IF)
Open Source Code Yes Our experiment code is available at https://github.com/sony/ds-research-code/tree/master/aaai23-pasif
Open Datasets Yes Note that our synthetic experiment is implemented on top of Open Bandit Pipeline (Saito et al. 2021a).7 https://github.com/st-tech/zr-obp
Dataset Splits No The paper describes data collection and subsampling for pseudo datasets but does not provide specific train/validation/test dataset splits (e.g., percentages or counts) for their main experiments.
Hardware Specification No The paper does not provide any specific hardware details such as CPU or GPU models, processor types, or memory used for running the experiments.
Software Dependencies No The paper mentions using "Open Bandit Pipeline" and "SLOPE" but does not provide specific version numbers for these or any other software dependencies.
Experiment Setup Yes For PAS-IF, we set S = {0, 1, . . . , 9}, k = 0.2, η = 0.001, and T = 5, 000, and select the regularization coefficient λ from {10 1, 100, 101, 102, 103} by a procedure described in Section 4.