Off-Policy Evaluation with Policy-Dependent Optimization Response

Authors: Wenshuo Guo, Michael Jordan, Angela Zhou

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We corroborate our theoretical results with numerical simulations. For the policy-dependent optimization, we evaluate a min-cost bipartite matching problem, where the causal policy intervene on the edge costs (as detailed in Example 2.3). Specifically, the bipartite graph contains m = 500 left side nodes W1, , Wm, and m0 = 300 right side nodes. The policy t applies treatments to the left side nodes and the outcome is the edge cost of edges with that node. While we grow the training data size, we fixed m, m0 (with m > m0) and evaluate over ten random draw of train/test data for each value of n. Figure 2 plots the results. When there is mis-specification, even a large training dataset cannot bring bias correction for the direct method, where both WDM and GRDR enjoy smaller and decreasing MSE.
Researcher Affiliation Academia Wenshuo Guo Department of EECS University of California, Berkeley wguo@cs.berkeley.edu Michael I. Jordan Department of EECS and Department of Statistics University of California, Berkeley jordan@cs.berkeley.edu Angela Zhou Department of Data Sciences and Operations University of Southern California zhoua@usc.edu
Pseudocode Yes Algorithm 1 Perturbation method, Alg. 2 of Ito et al. [2018]) input Estimation strategy 2 {WDM, GRDR}; h: finite different parameter; : policy. ... Algorithm 2 Subgradient method for policy optimization 1: Input: step size , linear objective function f. 2: for j = 1, 2, do ...
Open Source Code No All code will be published. This indicates future availability, not current concrete access.
Open Datasets No Since real data suitable for both policy evaluation and downstream optimization is unavailable, we focus on synthetic data and downstream bipartite matching. We generated dataset D1 = {(W, T, c)} with covariate W N(0, 1), confounded treatment T, and outcome c. The paper does not provide access information for this synthetic data.
Dataset Splits No The paper mentions "train/test data" and "training data size", but does not explicitly specify a validation split or the percentages/counts for the train/test splits. It states "evaluate over ten random draw of train/test data for each value of n".
Hardware Specification No The paper does not contain any specific details about the hardware used for the experiments (e.g., GPU models, CPU types, memory).
Software Dependencies No The paper does not provide specific software dependencies with version numbers.
Experiment Setup Yes We generated dataset D1 = {(W, T, c)} with covariate W N(0, 1), confounded treatment T, and outcome c. Treatment is drawn with probability b t(W) = (1 + e '1W +'2) 1, '1 = '2 = 0.5. The true outcome model is given by a degree-2 polynomial,9 ct(w) = poly (t, w) + , where N(0, 1). . In the mis-specified setting that induces confounding, the outcome model is a vanilla linear regression over W without the polynomial expansion. For the policy-dependent optimization, we evaluate a min-cost bipartite matching problem, where the causal policy intervene on the edge costs (as detailed in Example 2.3). Specifically, the bipartite graph contains m = 500 left side nodes W1, , Wm, and m0 = 300 right side nodes. We consider a logistic policy t(W) = sigmoid('1 W + '2). To study the convergence and the effectiveness of the subgradient algorithm for minimization, we fix a test set and perform subgradient descent over 60 iterations for each run. If not stated otherwise we spread the coefficients as poly (t, w) = (1, w, t, w2, wt, t2) ([5, 1, 1, 2, 2, 1])>.