Proximal Causal Inference With Text Data

Authors: Jacob Chen, Rohit Bhattacharya, Katherine Keith

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our method in synthetic and semi-synthetic settings the latter with real-world clinical notes from MIMIC-III and open large language models for zeroshot prediction and find that our method produces estimates with low bias.
Researcher Affiliation Academia Jacob M. Chen Department of Computer Science Johns Hopkins University jchen459@jhu.edu Rohit Bhattacharya Department of Computer Science Williams College rb17@williams.edu Katherine A. Keith Department of Computer Science Williams College kak5@williams.edu
Pseudocode Yes Algorithm 1 for inferring two text-based proxies
Open Source Code Yes Supporting code is available at https://github.com/jacobmchen/proximal_w_text.
Open Datasets Yes For our semi-synthetic experiments, we use MIMIC-III, a deidentified dataset of patients admitted to critical care units at a large tertiary care hospital (Johnson et al., 2016).
Dataset Splits Yes Following sample splitting from the causal inference literature Hansen (2000), we start by splitting the semi-synthetic dataset into two splits split 1 and split 2 where both splits are 50% of the original dataset.
Hardware Specification Yes To run the experiments in this paper, we used a local server with 64 cores of CPUs and 4 x NVIDIA RTX A6000 48GB GPUs.
Software Dependencies No The paper mentions using 'scikit-learn library' and specific large language models (FLAN-T5 XXL, OLMo-7B-Instruct) but does not provide specific version numbers for these software components or any other ancillary software.
Experiment Setup Yes Whenever the positivity rate of W is less than 0.2 or greater than 0.8, i.e. there is a class imbalance, we set the hyperparameter class_weight to balanced. ... we set the hyperparameter penalty to None to turn off regularization.