reproducibilityindex.ai

Proximal Causal Learning of Conditional Average Treatment Effects

Authors: Erik Sverdrup, Yifan Cui

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To illustrate the promise of the P-learner we consider a simple motivating example that highlights some salient features (to the best of our knowledge, we are not aware of other current proposals for proximal CATE estimation, making traditional benchmark comparisons challenging). We design a proximal data generating mechanism using the setup from Cui et al. (2023) where we incorporate treatment heterogeneity using the moderately complex CATE function τ (X) = exp(X(1)) 3X(2) used in Shen & Cui (2022) to learn proximal treatment regimes and add three additional irrelevant normally distributed covariates X (the complete setup is described in Appendix B). In the left-most plot in Figure 2 we train a Causal Forest (Athey et al., 2019), a popular method for estimating CATEs under conditional exchangeability on data with ntrain = 4000 samples and predict the estimated CATEs on a test set with ntest = 2000.
Researcher Affiliation	Academia	1Graduate School of Business, Stanford University, Stanford, USA. 2Center for Data Science, Zhejiang University, Hangzhou, China.
Pseudocode	Yes	Algorithm 1. (P-learner) Step 1. Split the data, i = 1 . . . n, into C evenly sized folds. Estimate h(w, a, x) and q(z, a, x) with crossfitting over the C folds, using tuning as appropriate. Step 2. Form the scores (3) using cross-fit plug-in estimates of nuisance components ˆh( c(i))(z, a, x) and ˆq( c(i))(z, a, x)...
Open Source Code	No	The paper references third-party software like 'glmnet (Friedman et al., 2010; R Core Team, 2022)' and 'grf: Generalized Random Forests, 2022. URL https: /github.com/grf-labs/grf. R package version 2.2.1.', but does not provide a link or explicit statement for the open-sourcing of their own P-learner implementation code.
Open Datasets	Yes	We design a proximal data generating mechanism using the setup from Cui et al. (2023)... (the complete setup is described in Appendix B).
Dataset Splits	Yes	Step 1. Split the data, i = 1 . . . n, into C evenly sized folds.
Hardware Specification	No	The paper does not provide specific details on the hardware used for running experiments, such as GPU models, CPU types, or memory.
Software Dependencies	No	The paper mentions software like 'glmnet', 'XGBoost', and 'grf: Generalized Random Forests, R package version 2.2.1'. While 'grf' has a version, specific version numbers for other key software components such as glmnet or XGBoost are not provided, preventing full reproducibility of the software environment.
Experiment Setup	Yes	For the Lasso learner, we use the same spline-based featurization for the continuous covariates Xi and just interactions for the binary Xi.