reproducibilityindex.ai

Prediction-powered Generalization of Causal Inferences

Authors: Ilker Demirel, Ahmed Alaa, Anthony Philippakis, David Sontag

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We simulate over a thousand data-generating processes and find that our estimators yield remarkable improvements when the observational data is high-quality and maintain baseline performance when it is not (Section 5). and We compare the root MSE (RMSE) of our estimators (14) and (25), which combine experimental and observational data, to that of the baselines (4) and (9) which use them alone.
Researcher Affiliation	Academia	1MIT CSAIL 2Eric and Wendy Schmidt Center, Broad Institute of MIT and Harvard 3Department of Computational Precision Health, UC Berkeley and UCSF.
Pseudocode	Yes	Algorithm 1 Generalization via additive bias correction and Algorithm 2 Generalization via augmented outcome model
Open Source Code	Yes	Our code is available at https://github.com/demireal/ppci.
Open Datasets	No	We simulate over a thousand different synthetic data generating processes... We consider two covariates X, U [ 1, 1]... We first describe the probabilistic model that generates the potential outcomes... We generate FOMa by sampling from a GP... The paper generates its own synthetic data rather than using an externally accessible public dataset.
Dataset Splits	Yes	We fit f1(X) from the OS with a neural network, and g1(X; ˆθ), b1(X; ˆγ), h1( X; ˆβ) are fit from the trial sample D1 via polynomial ridge regression with 5-fold cross-validation.
Hardware Specification	No	The paper describes synthetic experiments and data generation but does not specify any hardware details like GPU models, CPU types, or cloud computing resources used for running the experiments.
Software Dependencies	No	The paper mentions fitting models with a 'neural network' and 'polynomial ridge regression' and states 'Our code is available at https://github.com/demireal/ppci.', but it does not provide specific software dependencies with version numbers in the text.
Experiment Setup	Yes	We use trial sizes n1 {200, 1000} and l FOM1 x {0.5, 0.2}... The sample size for the observational study (OS) and the target sample are set to 50,000 and 20,000, respectively. We fit f1(X) from the OS with a neural network, and g1(X; ˆθ), b1(X; ˆγ), h1( X; ˆβ) are fit from the trial sample D1 via polynomial ridge regression with 5-fold cross-validation.