Trustworthy Policy Learning under the Counterfactual No-Harm Criterion
Authors: Haoxuan Li, Chunyuan Zheng, Yixiao Cao, Zhi Geng, Yue Liu, Peng Wu
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments are conducted to show the effectiveness of the proposed policy learning approach for satisfying the counterfactual no-harm criterion. |
| Researcher Affiliation | Academia | 1Center for Data Science, Peking University 2Department of Mathematics, University of California, San Diego 3School of Mathematics and Statistics, Beijing Technology and Business University 4Center for Applied Statistics and School of Statistics, Renmin University of China. |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide concrete access to source code for the methodology described. |
| Open Datasets | Yes | Following previous studies (Shalit et al., 2017; Louizos et al., 2017; Yoon et al., 2018; Yao et al., 2018), we conduct extensive experiments on one semi-synthetic dataset, IHDP, and one real-world dataset, JOBS. The IHDP dataset (Hill, 2011) is based on the Infant Health and Development Program (IHDP)... The JOBS dataset (La Londe, 1986) is based on the National Supported Work program... |
| Dataset Splits | Yes | Let K be a small positive integer, and (for simplicity) suppose that m = n/K is also an integer. Let I1, ..., IK be a random partition of the index set I = {1, ..., n} so that #Ik = m for k = 1, ..., K. Denote IC k as the complement of Ik. ...This is the cross-fitting approach to machine-learning-aided causal inference advocated by (Chernozhukov et al., 2018), which is prevalent in many recent literature of causal inference (Wager & Athey, 2018; Athey et al., 2019; Semenova & Chernozhukov, 2021). |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments. |
| Software Dependencies | No | In addition to using the AIPW estimator to estimate the policy reward and counterfactual harm upper bound in Section 6, we further explore the use of outcome regression (OR), inverse probability weighting (IPW) as alternative estimators for policy learning. The paper mentions software components (AIPW, OR, IPW estimators) but does not provide specific version numbers for these or any other software dependencies. |
| Experiment Setup | Yes | Thus, we simulate potential outcomes based on the covariates as follows: Yi(0) ~ Bern(σ(w0xi + ϵ0,i)), and Yi(1) ~ Bern(σ(w1xi + ϵ1,i)), where σ( ) is the sigmoid function, w0 ~ N[-1,1](0, 1) follows a truncated normal distribution, w1 ~ Unif(-1, 1) follows a uniform distribution, ϵ0,i ~ N(α0, 1), and ϵ1,i ~ N(α1, 1). We set the noise parameters α0 = 1 and α1 = 3 for IHDP and α0 = 0 and α1 = 2 for JOBS. |