Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Policy-Adaptive Estimator Selection for Off-Policy Evaluation

Authors: Takuma Udagawa, Haruka Kiyohara, Yusuke Narita, Yuta Saito, Kei Tateno

AAAI 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Comprehensive experiments on both synthetic and real-world company data demonstrate that the proposed procedure substantially improves the estimator selection compared to a non-adaptive heuristic.
Researcher Affiliation	Collaboration	1Sony Group Corporation 2Tokyo Institute of Technology 3Yale University 4Cornell University EMAIL, EMAIL, EMAIL, EMAIL, EMAIL
Pseudocode	Yes	Algorithm 1: Policy-Adaptive Estimator Selection via Importance Fitting (PAS-IF)
Open Source Code	Yes	Our experiment code is available at https://github.com/sony/ds-research-code/tree/master/aaai23-pasif
Open Datasets	Yes	Note that our synthetic experiment is implemented on top of Open Bandit Pipeline (Saito et al. 2021a).7 https://github.com/st-tech/zr-obp
Dataset Splits	No	The paper describes data collection and subsampling for pseudo datasets but does not provide specific train/validation/test dataset splits (e.g., percentages or counts) for their main experiments.
Hardware Specification	No	The paper does not provide any specific hardware details such as CPU or GPU models, processor types, or memory used for running the experiments.
Software Dependencies	No	The paper mentions using "Open Bandit Pipeline" and "SLOPE" but does not provide specific version numbers for these or any other software dependencies.
Experiment Setup	Yes	For PAS-IF, we set S = {0, 1, . . . , 9}, k = 0.2, η = 0.001, and T = 5, 000, and select the regularization coefficient λ from {10 1, 100, 101, 102, 103} by a procedure described in Section 4.