Deeply-Debiased Off-Policy Interval Estimation
Authors: Chengchun Shi, Runzhe Wan, Victor Chernozhukov, Rui Song
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our method is justified by theoretical results and numerical experiments. A Python implementation of the proposed procedure is available at https: //github.com/Runzhe Stat/D2OPE. In this section, we evaluate the empirical performance of our method using two synthetic datasets |
| Researcher Affiliation | Academia | 1Department of Statistics, London School of Economics and Political Science, London, United Kingdom 2Department of Statistics, North Carolina State University, Raleigh, USA 3Department of Economics, Massachusetts Institute of Technology, Cambridge, USA. Correspondence to: Rui Song <rsong@ncsu.edu>. |
| Pseudocode | No | The paper describes its procedure in numbered steps but does not provide a formal pseudocode block or algorithm listing. |
| Open Source Code | Yes | A Python implementation of the proposed procedure is available at https: //github.com/Runzhe Stat/D2OPE. |
| Open Datasets | Yes | Take the Ohio T1DM dataset (Marling & Bunescu, 2018) as an example, only a few thousands observations are available (Shi et al., 2020b). ... using two synthetic datasets: Cart Pole from the Open AI Gym environment (Brockman et al., 2016) and a simulation environment (referred to as Diabetes) to simulate the Ohio T1DM data (Shi et al., 2020b). |
| Dataset Splits | Yes | Step 1. Data Splitting. We randomly divide the indices of all trajectories {1, , n} into K >= 2 disjoint subsets. Denote the kth subset by Ik and let Ic k = {1, , n} \ Ik. Data splitting allows us to use one part of data (Ic k) to train RL models and the remaining part (Ik) to do the estimation of the main parameter, i.e., ηπ. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running the experiments. |
| Software Dependencies | No | The paper mentions 'A Python implementation' and using 'random forest' but does not provide specific version numbers for Python, random forest libraries, or any other software dependencies. |
| Experiment Setup | Yes | We set T = 300 and γ = 0.98 for Cart Pole, and T = 200 and γ = 0.95 for Diabetes. For both environments, we vary the number of trajectories n and the temperature τ to design different settings. Results are aggregated over 200 replications. |