Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Multi-Accurate CATE is Robust to Unknown Covariate Shifts
Authors: Christoph Kern, Michael P. Kim, Angela Zhou
TMLR 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct a thorough empirical study comparing finite- and large-sample performance of multi-accurate learning and other causal machine learning techniques more specifically tailored for causal structure. In Section 5, we provide extensive empirical comparisons in simulated data and a case study of the Women s Health Initiative parallel clinical trial and observational study. |
| Researcher Affiliation | Academia | Christoph Kern EMAIL Department of Statistics Ludwig-Maximilians-University of Munich Munich Center for Machine Learning (MCML); Michael Kim EMAIL Department of Computer Science Cornell University; Angela Zhou EMAIL Department of Data Sciences and Operations University of Southern California |
| Pseudocode | Yes | Algorithm 1 Multi-accuracy for CATE estimation for Setting 1, unknown covariate shifts; Algorithm 2 Multi-accuracy for CATE estimation for calibrating CATE on small Randomized Controlled Trial data; Algorithm 3 Multi-accurate DR-learner (Equation (8)) for unknown covariate shift; Algorithm 4 MCBoost |
| Open Source Code | Yes | We provide code of the simulation studies and the real data application for replication purposes in the following public OSF repository: https://osf.io/zxjvw/?view_only=a622c123414e4be6a218f121ded191d3 |
| Open Datasets | Yes | We next present a case study that draws on data from the Women s Health Initiative (WHI) studies (Machens and Schmidt-Gollwitzer, 2003). |
| Dataset Splits | Yes | The size of the (audit/RCT) data used for multi-calibration boosting (500 observations) and the (test) data used for model evaluation (5000 observations) is fixed. We vary the shift intensity s {0, 0.25, . . . , 2} and training set size {500, 2000, 3500, 5000}, and run experiments for each combination 25 times. We start with the observational study (OS) (52,335 observations) and draw a random 50% sample that serves as observational training data for (naive) CATE estimation. We split the clinical trial data (14,531 observations) into an initial 50% training set and a 50% test set. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments. It mentions R packages for implementation but no hardware. |
| Software Dependencies | Yes | Data preparations, model training and evaluation are conducted in R (3.6.3) (R Core Team, 2020) using the packages ranger (0.13.1) (Wright and Ziegler, 2017), grf (2.0.2) (Tibshirani et al., 2021) and rlearner (1.1.0) (Nie and Wager, 2020). The simulation studies heavily draw on the causal experiment simulator of the causal Toolbox (0.0.2.000) (Kรผnzel et al., 2019) package. In all experiments, (initial) T-learner and DR-learner are post-processed using the MCBoost algorithm as implemented in the mcboost (0.4.2) (Pfisterer et al., 2021) package. |
| Experiment Setup | Yes | Table 2: Hyperparameter settings for post-processing using MCBoost.; Table 3: Hyperparameter settings of (baseline) CATE learners.; Table 13: Hyperparameter settings for post-processing using MCBoost.; Table 14: Hyperparameter settings of (baseline) CATE learners. These tables specify various hyperparameters such as max_iter, alpha, eta, num.trees, mtry, sample.fraction, honesty.fraction, min.node.size, and maxdepth. |