reproducibilityindex.ai

Off-Policy Evaluation with Policy-Dependent Optimization Response

Authors: Wenshuo Guo, Michael Jordan, Angela Zhou

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We corroborate our theoretical results with numerical simulations. For the policy-dependent optimization, we evaluate a min-cost bipartite matching problem, where the causal policy intervene on the edge costs (as detailed in Example 2.3). Speciﬁcally, the bipartite graph contains m = 500 left side nodes W1, , Wm, and m0 = 300 right side nodes. The policy t applies treatments to the left side nodes and the outcome is the edge cost of edges with that node. While we grow the training data size, we ﬁxed m, m0 (with m > m0) and evaluate over ten random draw of train/test data for each value of n. Figure 2 plots the results. When there is mis-speciﬁcation, even a large training dataset cannot bring bias correction for the direct method, where both WDM and GRDR enjoy smaller and decreasing MSE.
Researcher Affiliation	Academia	Wenshuo Guo Department of EECS University of California, Berkeley wguo@cs.berkeley.edu Michael I. Jordan Department of EECS and Department of Statistics University of California, Berkeley jordan@cs.berkeley.edu Angela Zhou Department of Data Sciences and Operations University of Southern California zhoua@usc.edu
Pseudocode	Yes	Algorithm 1 Perturbation method, Alg. 2 of Ito et al. [2018]) input Estimation strategy 2 {WDM, GRDR}; h: ﬁnite different parameter; : policy. ... Algorithm 2 Subgradient method for policy optimization 1: Input: step size , linear objective function f. 2: for j = 1, 2, do ...
Open Source Code	No	All code will be published. This indicates future availability, not current concrete access.
Open Datasets	No	Since real data suitable for both policy evaluation and downstream optimization is unavailable, we focus on synthetic data and downstream bipartite matching. We generated dataset D1 = {(W, T, c)} with covariate W N(0, 1), confounded treatment T, and outcome c. The paper does not provide access information for this synthetic data.
Dataset Splits	No	The paper mentions "train/test data" and "training data size", but does not explicitly specify a validation split or the percentages/counts for the train/test splits. It states "evaluate over ten random draw of train/test data for each value of n".
Hardware Specification	No	The paper does not contain any specific details about the hardware used for the experiments (e.g., GPU models, CPU types, memory).
Software Dependencies	No	The paper does not provide specific software dependencies with version numbers.
Experiment Setup	Yes	We generated dataset D1 = {(W, T, c)} with covariate W N(0, 1), confounded treatment T, and outcome c. Treatment is drawn with probability b t(W) = (1 + e '1W +'2) 1, '1 = '2 = 0.5. The true outcome model is given by a degree-2 polynomial,9 ct(w) = poly (t, w) + , where N(0, 1). . In the mis-speciﬁed setting that induces confounding, the outcome model is a vanilla linear regression over W without the polynomial expansion. For the policy-dependent optimization, we evaluate a min-cost bipartite matching problem, where the causal policy intervene on the edge costs (as detailed in Example 2.3). Speciﬁcally, the bipartite graph contains m = 500 left side nodes W1, , Wm, and m0 = 300 right side nodes. We consider a logistic policy t(W) = sigmoid('1 W + '2). To study the convergence and the effectiveness of the subgradient algorithm for minimization, we ﬁx a test set and perform subgradient descent over 60 iterations for each run. If not stated otherwise we spread the coefﬁcients as poly (t, w) = (1, w, t, w2, wt, t2) ([5, 1, 1, 2, 2, 1])>.