Optimal Treatment Allocation for Efficient Policy Evaluation in Sequential Decision Making
Authors: Ting Li, Chengchun Shi, Jianing Wang, Fan Zhou, hongtu zhu
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | This paper studies optimal designs that aim to maximize the amount of information obtained from online experiments to estimate treatment effects accurately. We propose three optimal allocation strategies in a dynamic setting where treatments are sequentially assigned over time. These strategies are designed to minimize the variance of the treatment effect estimator when data follow a non-Markov decision process or a (time-varying) Markov decision process. We further develop estimation procedures based on existing off-policy evaluation (OPE) methods and conduct extensive experiments in various environments to demonstrate the effectiveness of the proposed methodologies. |
| Researcher Affiliation | Academia | 1School of Statistics and Management, Shanghai University of Finance and Economics 2Department of Statistics, London School of Economics and Political Science 3Department of Biostatistics, University of North Carolina at Chapel Hill |
| Pseudocode | Yes | Algorithm 1: Treatment allocation algorithm for NMDPs Algorithm 2: Treatment allocation algorithm for TMDPs |
| Open Source Code | Yes | Python code implementing the proposed algorithms is available at https://github.com/tingstat/MDP_design. |
| Open Datasets | No | The paper describes using "real datasets gathered from a globally recognized ride-sharing company to create a city-scale synthetic environment" and a "city-scale order-driver historical dataset from a world-leading ride-sharing platform" for its experiments. However, it does not provide specific access information (links, DOIs, formal citations for publicly available versions) for these datasets. Examples 5.1 and 5.2 use simulated data without public access details. |
| Dataset Splits | No | The paper mentions a "burn-in period m0" and an "online manner" for updates, but it does not specify traditional dataset splits (e.g., percentages or counts for training, validation, or test sets) for model development or evaluation, nor does it mention cross-validation. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., GPU models, CPU types, memory) used to conduct the experiments. |
| Software Dependencies | No | The paper mentions "Python code implementing the proposed algorithms" but does not specify any particular software libraries, frameworks, or their version numbers used for the experiments. |
| Experiment Setup | Yes | We set the burn-in period m0 to n/4 in all the experiments. Detailed information on the data-generating processes can be found in Section S2 of the supplementary material. Example 5.1 (Binary Observations). ...when ps = 0.8, T = 50, and n = 50. Example 5.3 (Synthetic Dispatch). ...a duration of 20 time steps per day. ...The number of days is set to be n {30, 50, 100}. All the methods are tested using 100 orders with the number of drivers being either generated from the uniform distribution U(25, 30), or being fixed to 25, 50. |