Policy-Adaptive Estimator Selection for Off-Policy Evaluation
Authors: Takuma Udagawa, Haruka Kiyohara, Yusuke Narita, Yuta Saito, Kei Tateno
AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Comprehensive experiments on both synthetic and real-world company data demonstrate that the proposed procedure substantially improves the estimator selection compared to a non-adaptive heuristic. |
| Researcher Affiliation | Collaboration | 1Sony Group Corporation 2Tokyo Institute of Technology 3Yale University 4Cornell University Takuma.Udagawa@sony.com, kiyohara.h.aa@m.titech.ac.jp, yusuke.narita@yale.edu, ys552@cornell.edu, Kei.Tateno@sony.com |
| Pseudocode | Yes | Algorithm 1: Policy-Adaptive Estimator Selection via Importance Fitting (PAS-IF) |
| Open Source Code | Yes | Our experiment code is available at https://github.com/sony/ds-research-code/tree/master/aaai23-pasif |
| Open Datasets | Yes | Note that our synthetic experiment is implemented on top of Open Bandit Pipeline (Saito et al. 2021a).7 https://github.com/st-tech/zr-obp |
| Dataset Splits | No | The paper describes data collection and subsampling for pseudo datasets but does not provide specific train/validation/test dataset splits (e.g., percentages or counts) for their main experiments. |
| Hardware Specification | No | The paper does not provide any specific hardware details such as CPU or GPU models, processor types, or memory used for running the experiments. |
| Software Dependencies | No | The paper mentions using "Open Bandit Pipeline" and "SLOPE" but does not provide specific version numbers for these or any other software dependencies. |
| Experiment Setup | Yes | For PAS-IF, we set S = {0, 1, . . . , 9}, k = 0.2, η = 0.001, and T = 5, 000, and select the regularization coefficient λ from {10 1, 100, 101, 102, 103} by a procedure described in Section 4. |