Off-Policy Evaluation with Policy-Dependent Optimization Response
Authors: Wenshuo Guo, Michael Jordan, Angela Zhou
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We corroborate our theoretical results with numerical simulations. For the policy-dependent optimization, we evaluate a min-cost bipartite matching problem, where the causal policy intervene on the edge costs (as detailed in Example 2.3). Specifically, the bipartite graph contains m = 500 left side nodes W1, , Wm, and m0 = 300 right side nodes. The policy t applies treatments to the left side nodes and the outcome is the edge cost of edges with that node. While we grow the training data size, we fixed m, m0 (with m > m0) and evaluate over ten random draw of train/test data for each value of n. Figure 2 plots the results. When there is mis-specification, even a large training dataset cannot bring bias correction for the direct method, where both WDM and GRDR enjoy smaller and decreasing MSE. |
| Researcher Affiliation | Academia | Wenshuo Guo Department of EECS University of California, Berkeley wguo@cs.berkeley.edu Michael I. Jordan Department of EECS and Department of Statistics University of California, Berkeley jordan@cs.berkeley.edu Angela Zhou Department of Data Sciences and Operations University of Southern California zhoua@usc.edu |
| Pseudocode | Yes | Algorithm 1 Perturbation method, Alg. 2 of Ito et al. [2018]) input Estimation strategy 2 {WDM, GRDR}; h: finite different parameter; : policy. ... Algorithm 2 Subgradient method for policy optimization 1: Input: step size , linear objective function f. 2: for j = 1, 2, do ... |
| Open Source Code | No | All code will be published. This indicates future availability, not current concrete access. |
| Open Datasets | No | Since real data suitable for both policy evaluation and downstream optimization is unavailable, we focus on synthetic data and downstream bipartite matching. We generated dataset D1 = {(W, T, c)} with covariate W N(0, 1), confounded treatment T, and outcome c. The paper does not provide access information for this synthetic data. |
| Dataset Splits | No | The paper mentions "train/test data" and "training data size", but does not explicitly specify a validation split or the percentages/counts for the train/test splits. It states "evaluate over ten random draw of train/test data for each value of n". |
| Hardware Specification | No | The paper does not contain any specific details about the hardware used for the experiments (e.g., GPU models, CPU types, memory). |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers. |
| Experiment Setup | Yes | We generated dataset D1 = {(W, T, c)} with covariate W N(0, 1), confounded treatment T, and outcome c. Treatment is drawn with probability b t(W) = (1 + e '1W +'2) 1, '1 = '2 = 0.5. The true outcome model is given by a degree-2 polynomial,9 ct(w) = poly (t, w) + , where N(0, 1). . In the mis-specified setting that induces confounding, the outcome model is a vanilla linear regression over W without the polynomial expansion. For the policy-dependent optimization, we evaluate a min-cost bipartite matching problem, where the causal policy intervene on the edge costs (as detailed in Example 2.3). Specifically, the bipartite graph contains m = 500 left side nodes W1, , Wm, and m0 = 300 right side nodes. We consider a logistic policy t(W) = sigmoid('1 W + '2). To study the convergence and the effectiveness of the subgradient algorithm for minimization, we fix a test set and perform subgradient descent over 60 iterations for each run. If not stated otherwise we spread the coefficients as poly (t, w) = (1, w, t, w2, wt, t2) ([5, 1, 1, 2, 2, 1])>. |