Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Proximal Causal Inference With Text Data

Authors: Jacob Chen, Rohit Bhattacharya, Katherine Keith

NeurIPS 2024 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our method in synthetic and semi-synthetic settings the latter with real-world clinical notes from MIMIC-III and open large language models for zeroshot prediction and find that our method produces estimates with low bias.
Researcher Affiliation Academia Jacob M. Chen Department of Computer Science Johns Hopkins University EMAIL Rohit Bhattacharya Department of Computer Science Williams College EMAIL Katherine A. Keith Department of Computer Science Williams College EMAIL
Pseudocode Yes Algorithm 1 for inferring two text-based proxies
Open Source Code Yes Supporting code is available at https://github.com/jacobmchen/proximal_w_text.
Open Datasets Yes For our semi-synthetic experiments, we use MIMIC-III, a deidentified dataset of patients admitted to critical care units at a large tertiary care hospital (Johnson et al., 2016).
Dataset Splits Yes Following sample splitting from the causal inference literature Hansen (2000), we start by splitting the semi-synthetic dataset into two splits split 1 and split 2 where both splits are 50% of the original dataset.
Hardware Specification Yes To run the experiments in this paper, we used a local server with 64 cores of CPUs and 4 x NVIDIA RTX A6000 48GB GPUs.
Software Dependencies No The paper mentions using 'scikit-learn library' and specific large language models (FLAN-T5 XXL, OLMo-7B-Instruct) but does not provide specific version numbers for these software components or any other ancillary software.
Experiment Setup Yes Whenever the positivity rate of W is less than 0.2 or greater than 0.8, i.e. there is a class imbalance, we set the hyperparameter class_weight to balanced. ... we set the hyperparameter penalty to None to turn off regularization.