Prediction-powered Generalization of Causal Inferences
Authors: Ilker Demirel, Ahmed Alaa, Anthony Philippakis, David Sontag
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We simulate over a thousand data-generating processes and find that our estimators yield remarkable improvements when the observational data is high-quality and maintain baseline performance when it is not (Section 5). and We compare the root MSE (RMSE) of our estimators (14) and (25), which combine experimental and observational data, to that of the baselines (4) and (9) which use them alone. |
| Researcher Affiliation | Academia | 1MIT CSAIL 2Eric and Wendy Schmidt Center, Broad Institute of MIT and Harvard 3Department of Computational Precision Health, UC Berkeley and UCSF. |
| Pseudocode | Yes | Algorithm 1 Generalization via additive bias correction and Algorithm 2 Generalization via augmented outcome model |
| Open Source Code | Yes | Our code is available at https://github.com/demireal/ppci. |
| Open Datasets | No | We simulate over a thousand different synthetic data generating processes... We consider two covariates X, U [ 1, 1]... We first describe the probabilistic model that generates the potential outcomes... We generate FOMa by sampling from a GP... The paper generates its own synthetic data rather than using an externally accessible public dataset. |
| Dataset Splits | Yes | We fit f1(X) from the OS with a neural network, and g1(X; ˆθ), b1(X; ˆγ), h1( X; ˆβ) are fit from the trial sample D1 via polynomial ridge regression with 5-fold cross-validation. |
| Hardware Specification | No | The paper describes synthetic experiments and data generation but does not specify any hardware details like GPU models, CPU types, or cloud computing resources used for running the experiments. |
| Software Dependencies | No | The paper mentions fitting models with a 'neural network' and 'polynomial ridge regression' and states 'Our code is available at https://github.com/demireal/ppci.', but it does not provide specific software dependencies with version numbers in the text. |
| Experiment Setup | Yes | We use trial sizes n1 {200, 1000} and l FOM1 x {0.5, 0.2}... The sample size for the observational study (OS) and the target sample are set to 50,000 and 20,000, respectively. We fit f1(X) from the OS with a neural network, and g1(X; ˆθ), b1(X; ˆγ), h1( X; ˆβ) are fit from the trial sample D1 via polynomial ridge regression with 5-fold cross-validation. |