Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Graph Transformers on EHRs: Better Representation Improves Downstream Performance
Authors: Raphael Poulain, Rahmatollah Beheshti
ICLR 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 3 EXPERIMENTS Data and Predictive Tasks We have evaluated the proposed method on a variety of predictive medical tasks on multiple datasets: in-hospital mortality (Mortality) and prolonged length of stay prediction in ICU (Length of Stay) on the MIMIC-IV dataset (Johnson et al., 2021), a dataset from the Beth Israel Deaconess Medical Center containing hospitalization data, and Heart Failure prediction on the All of Us dataset (All of Us Research Program Investigators, 2019), a large publicly-available EHR dataset of adult patients across the US. |
| Researcher Affiliation | Academia | Raphael Poulain, Rahmatollah Beheshti University of Delaware EMAIL |
| Pseudocode | No | The paper includes architectural diagrams (Figure 1) and process flowcharts (Figure 4) but does not provide any explicitly labeled "Pseudocode" or "Algorithm" blocks with structured code-like steps. |
| Open Source Code | Yes | Our code is publicly available at https://github.com/healthylaife/GT-BEHRT |
| Open Datasets | Yes | We have evaluated the proposed method on a variety of predictive medical tasks on multiple datasets: in-hospital mortality (Mortality) and prolonged length of stay prediction in ICU (Length of Stay) on the MIMIC-IV dataset (Johnson et al., 2021)... and Heart Failure prediction on the All of Us dataset (All of Us Research Program Investigators, 2019), a large publicly-available EHR dataset of adult patients across the US. |
| Dataset Splits | Yes | For all experiments, we randomly split the dataset into an 80/10/10 train/validation/test regime and repeated the process five times, each time with a different random seed. |
| Hardware Specification | Yes | all the experiments were run on an NVIDIA T4 GPU. |
| Software Dependencies | No | The paper mentions using 'Py Torch', 'Py Health', and 'Py Torch Geometric' for implementation. However, it does not specify any version numbers for these software dependencies, which is required for reproducibility. |
| Experiment Setup | Yes | For each baseline, as well as the proposed method, we have determined the optimal hyperparameters through grid search. We report a complexity analysis of the models as well as the hyperparameters used throughout the experiments in Table 5. |