Validating Causal Inference Models via Influence Functions

Authors: Ahmed Alaa, Mihaela Van Der Schaar

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on 77 benchmark datasets show that using our procedure, we can accurately predict the comparative performances of state-of-the-art causal inference methods applied to a given observational study.
Researcher Affiliation Academia 1University of California, Los Angeles, USA 2University of Cambridge, Cambridge, UK 3Alan Turing Institute, London, UK.
Pseudocode No The paper provides a 'high-level description of our procedure' with numbered steps but it is not formally labeled as 'Pseudocode' or 'Algorithm' and is not formatted in a code-like manner.
Open Source Code No The paper does not provide an explicit statement about releasing source code for the methodology described, nor does it include a link to a code repository.
Open Datasets Yes We conducted extensive experiments on benchmark datasets released by the Atlantic Causal Inference Competition (Hill, 2016)... Those realizations were generated by the competition organizers and are publicly accessible (Hill, 2016).
Dataset Splits Yes For each realization, we divide the data into 80/20 train/test splits, and use training data to predict the PEHE of the 10 candidate models via 5-fold influence function-based validation.
Hardware Specification No The paper does not specify any particular hardware (e.g., GPU model, CPU model) used for running the experiments.
Software Dependencies No The paper mentions using 'XGBoost' but does not specify any version numbers for this or any other software dependency.
Experiment Setup Yes We use two XGBoost regression models for µ p,0 and µ p,1, and then calculate e T p = µ p,1 µ p,0. For π p, we use an XGBoost classifier. Our choice of XGBoost is motivated by its minimax optimality (Linero & Yang, 2018), which is required by Theorem 1. ... In all experiments, we set m = 1 since higher order influence terms did not improve the results.