An Instrumental Variable Approach to Confounded Off-Policy Evaluation

Authors: Yang Xu, Jin Zhu, Chengchun Shi, Shikai Luo, Rui Song

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Furthermore, we propose a number of policy value estimators and illustrate their effectiveness through extensive simulations and real data analysis from a worldleading short-video platform. In this section, we will conduct detailed comparisons between our estimator and other state-of-the-art methods for OPE estimation under MDPs via synthetic data (Section 7.1) and real-world data (Section 7.2).
Researcher Affiliation Collaboration 1Department of Statistics, North Carolina State University, Raleigh, USA 2Sun Yat-sen University, China 3Byte Dance, China 4Department of Statistics, London School of Economics and Political Science, London, UK.
Pseudocode Yes Algorithm 1 Model Selection for IV-based confounded OPE
Open Source Code Yes The source code is available on github: https://github.com/YangXU63/IVMDP.
Open Datasets No The paper mentions using "synthetic data" and a "real dataset from a world-leading technological company" for which they "generate a synthetic data environment based on the real data due to privacy considerations". No specific link, DOI, or formal citation is provided for public access to either the real or the generated synthetic dataset.
Dataset Splits No The paper describes the data generating process for simulations but does not specify how the data is split into training, validation, or test sets. It mentions "N = 1000 trajectories, each with T = 100 time points" but not the partitioning for model evaluation.
Hardware Specification No The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory) used to run the experiments.
Software Dependencies No The paper does not provide specific software dependencies with version numbers (e.g., Python version, library versions like PyTorch or TensorFlow).
Experiment Setup No The paper describes the data generation process and some parameters for a toy example (e.g., N, T, shift parameters alpha and beta). However, it does not provide specific hyperparameters or system-level training settings (e.g., learning rate, batch size, optimizer details, number of epochs) for the estimators described in the paper.