Towards Causal Foundation Model: on Duality between Optimal Balancing and Attention

Authors: Jiaqi Zhang, Joel Jennings, Agrin Hilmkil, Nick Pawlowski, Cheng Zhang, Chao Ma

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically, we verify the correctness of our theory and demonstrate the effectiveness of our algorithm on both synthetic and real-world datasets. Importantly, in the context of zero-shot causal inference on unseen datasets, we observed competitive and in-certain-cases better performance to traditional per-dataset causal inference approaches, while achieving substantial reductions in inference time. We study the performance of CIn A on causal inference tasks using both synthetic and real-world datasets.
Researcher Affiliation Collaboration 1Massachusetts Institute of Technology 2Google Deep Mind 3Work done while at Microsoft 4Microsoft Research Cambridge.
Pseudocode Yes Algorithm 1 Causal Inference with Attention (CIn A) ... Algorithm 2 CIn A (multi-dataset version). ... Algorithm 3 Direct Inference with CIn A.
Open Source Code Yes Code can be found at https://github.com/microsoft/causica/tree/main/research_experiments/cina. ... Code for our method can be found at https://github.com/microsoft/causica/tree/main/research_experiments/cina.
Open Datasets Yes The Infant Health and Development Program (IHDP) dataset is a semi-dataset complied by Hill (2011). We use the existing versions from Chernozhukov et al. (2022)... Twins. Introduced by Louizos et al. (2017), this is a semi-synthetic dataset based on the real data on twin births and twin mortality rates in the US from 1989 to 1991 (Almond et al., 2005). ... We also use the datasets from La Londe (1986)... The data for the 2018 Atlantic Causal Inference Conference competition (ACIC) (Shimoni et al., 2018) comprises of serveral semi-synthetic datasets derived from the linked birth and infant death (LBIDD) data (Mac Dorman & Atkinson, 1998).
Dataset Splits Yes We generate 100 different datasets (split into 60/20/20 for training/validation/testing). ... Our multi-dataset model, CIn A (ZS), is trained on 60 training datasets, with hyperparameters selected using 20 validation sets.
Hardware Specification No The paper does not provide specific details about the hardware used for running the experiments.
Software Dependencies No The paper mentions software like 'random forest classifier' and libraries for DML and SVM (e.g., scikit-learn) but does not provide specific version numbers for these software dependencies or programming languages.
Experiment Setup Yes Hyper-parameters. For both Algorithm 1 and Algorithm 2, we search for the optimal penalty λ > 0 from range [λmin, λmax]... For all the experiments, we use a cosine annealing schedule for the learning rate from lmax to lmin during the first half of the training epochs. Then the learning rate is fixed to lmin for the second half of the training epochs. ... For Algorithm 1, we train for 20, 000 epochs on all datasets. For Algorithm 2, we train for 4, 000 epochs on all datasets.