Proximal Causal Learning of Conditional Average Treatment Effects
Authors: Erik Sverdrup, Yifan Cui
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To illustrate the promise of the P-learner we consider a simple motivating example that highlights some salient features (to the best of our knowledge, we are not aware of other current proposals for proximal CATE estimation, making traditional benchmark comparisons challenging). We design a proximal data generating mechanism using the setup from Cui et al. (2023) where we incorporate treatment heterogeneity using the moderately complex CATE function τ (X) = exp(X(1)) 3X(2) used in Shen & Cui (2022) to learn proximal treatment regimes and add three additional irrelevant normally distributed covariates X (the complete setup is described in Appendix B). In the left-most plot in Figure 2 we train a Causal Forest (Athey et al., 2019), a popular method for estimating CATEs under conditional exchangeability on data with ntrain = 4000 samples and predict the estimated CATEs on a test set with ntest = 2000. |
| Researcher Affiliation | Academia | 1Graduate School of Business, Stanford University, Stanford, USA. 2Center for Data Science, Zhejiang University, Hangzhou, China. |
| Pseudocode | Yes | Algorithm 1. (P-learner) Step 1. Split the data, i = 1 . . . n, into C evenly sized folds. Estimate h(w, a, x) and q(z, a, x) with crossfitting over the C folds, using tuning as appropriate. Step 2. Form the scores (3) using cross-fit plug-in estimates of nuisance components ˆh( c(i))(z, a, x) and ˆq( c(i))(z, a, x)... |
| Open Source Code | No | The paper references third-party software like 'glmnet (Friedman et al., 2010; R Core Team, 2022)' and 'grf: Generalized Random Forests, 2022. URL https: /github.com/grf-labs/grf. R package version 2.2.1.', but does not provide a link or explicit statement for the open-sourcing of their own P-learner implementation code. |
| Open Datasets | Yes | We design a proximal data generating mechanism using the setup from Cui et al. (2023)... (the complete setup is described in Appendix B). |
| Dataset Splits | Yes | Step 1. Split the data, i = 1 . . . n, into C evenly sized folds. |
| Hardware Specification | No | The paper does not provide specific details on the hardware used for running experiments, such as GPU models, CPU types, or memory. |
| Software Dependencies | No | The paper mentions software like 'glmnet', 'XGBoost', and 'grf: Generalized Random Forests, R package version 2.2.1'. While 'grf' has a version, specific version numbers for other key software components such as glmnet or XGBoost are not provided, preventing full reproducibility of the software environment. |
| Experiment Setup | Yes | For the Lasso learner, we use the same spline-based featurization for the continuous covariates Xi and just interactions for the binary Xi. |