Combining Experimental and Historical Data for Policy Evaluation

Authors: Ting Li, Chengchun Shi, Qianglin Wen, Yang Sui, Yongli Qin, Chunbo Lai, Hongtu Zhu

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Numerical experiments and real-data-based analyses from a ridesharing company demonstrate the superior performance of the proposed estimators.
Researcher Affiliation Collaboration 1School of Statistics and Management, Shanghai University of Finance and Economics 2Department of Statistics, London School of Economics and Political Science 3Yunnan Key Laboratory of Statistical Modeling and Data Analysis, Yunnan University 4Didi Chuxing 5Department of Biostatistics, The University of North Carolina at Chapel Hill.
Pseudocode Yes Algorithm 1 Bootstrap-assisted procedure. Input: Real data {(Sit, Rit) : 1 i N; 1 t T}, the adjustment parameters for the ratios (δ1, δ2), the assignment of actions, the bootstrapped sample size (n = |De| or n = |Dh|, where |Dh| = m|De|), shifted mean parameter bh and standard deviation parameter d, random seed, the number of replications B = 200.
Open Source Code Yes R code implementing the proposed weighted estimators is available at https://github.com/tingstat/Data_ Combination.
Open Datasets Yes Example 6.2 (Ridesharing-data based sequential simulation). In this example, we build a simulation environment based on a real dataset collected from a ridesharing company... Example 6.4 (Clinical-data based non-dynamic simulation). The data for this experiment is sourced from the AIDS Clinical Trials Group Protocol 175, involving 2139 HIV-infected individuals. Participants were randomly assigned to one of four treatment groups: zidovudine (ZDV) monotherapy, ZDV+didanosine (dd I), ZDV+zalcitabine, or dd I monotherapy (Hammer et al., 1996).
Dataset Splits No The paper describes generating experimental and historical datasets for simulations and using bootstrap methods, but it does not specify explicit training, validation, or test dataset splits with percentages or sample counts for model training or evaluation.
Hardware Specification No The paper does not provide specific details about the hardware (e.g., GPU models, CPU types, memory amounts) used for running its experiments.
Software Dependencies No The paper mentions 'R code' for implementation but does not specify any software names with version numbers for replication, such as specific R packages or other libraries.
Experiment Setup Yes Example 6.1 (Continued). We consider the reward function as follows, Re = 10 + bh + Ae + Se + (2 + d)εe, Rh = 10 + Sh + εh, where Se, Sh s and εe, εh s are from standard normal distribution N(0, 1). The sample size |De| = 48 with a horizon of T = 1. The sample size of historical data is set to be |Dh| = m|De| with m {1, 2, 3}. We consider the switchback design, which alternates the treatment and the control along the time. We set bh to range over the set {0, 0.1, 0.2, . . . 1.5}. We also vary the conditional variance of the reward and use d to characterize this difference (see Appendix A for its detailed definition).