reproducibilityindex.ai

Proximal Causal Inference With Text Data

Authors: Jacob Chen, Rohit Bhattacharya, Katherine Keith

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our method in synthetic and semi-synthetic settings the latter with real-world clinical notes from MIMIC-III and open large language models for zeroshot prediction and find that our method produces estimates with low bias.
Researcher Affiliation	Academia	Jacob M. Chen Department of Computer Science Johns Hopkins University jchen459@jhu.edu Rohit Bhattacharya Department of Computer Science Williams College rb17@williams.edu Katherine A. Keith Department of Computer Science Williams College kak5@williams.edu
Pseudocode	Yes	Algorithm 1 for inferring two text-based proxies
Open Source Code	Yes	Supporting code is available at https://github.com/jacobmchen/proximal_w_text.
Open Datasets	Yes	For our semi-synthetic experiments, we use MIMIC-III, a deidentified dataset of patients admitted to critical care units at a large tertiary care hospital (Johnson et al., 2016).
Dataset Splits	Yes	Following sample splitting from the causal inference literature Hansen (2000), we start by splitting the semi-synthetic dataset into two splits split 1 and split 2 where both splits are 50% of the original dataset.
Hardware Specification	Yes	To run the experiments in this paper, we used a local server with 64 cores of CPUs and 4 x NVIDIA RTX A6000 48GB GPUs.
Software Dependencies	No	The paper mentions using 'scikit-learn library' and specific large language models (FLAN-T5 XXL, OLMo-7B-Instruct) but does not provide specific version numbers for these software components or any other ancillary software.
Experiment Setup	Yes	Whenever the positivity rate of W is less than 0.2 or greater than 0.8, i.e. there is a class imbalance, we set the hyperparameter class_weight to balanced. ... we set the hyperparameter penalty to None to turn off regularization.