Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Multimodal Disease Progression Modeling via Spatiotemporal Disentanglement and Multiscale Alignment

Authors: Chen Liu, Wenfang Yao, Kejing Yin, William K. Cheung, Jing Qin

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on the MIMIC dataset demonstrate that Di Pro could effectively extract temporal clinical dynamics and achieve state-of-the-art performance on both disease progression identification and general ICU prediction tasks.
Researcher Affiliation	Academia	Chen Liu1, Wenfang Yao1, Kejing Yin2 , William K. Cheung2, Jing Qin1 1School of Nursing, The Hong Kong Polytechnic University 2Department of Computer Science, Hong Kong Baptist University Correspondence to: Kejing Yin <EMAIL>
Pseudocode	No	The paper describes the methodology in detail in Sections 2.2, 2.3, and 2.4, and Figure 1 provides an overview of the framework, but there are no explicitly labeled pseudocode or algorithm blocks with structured steps.
Open Source Code	Yes	The code is available at https://github.com/Chenliu-svg/Di Pro. The source code is submitted as supplementary material alongside the paper and will be made publicly available upon acceptance.
Open Datasets	Yes	Datasets. We evaluated Di Pro on the large-scale, public dataset, MIMIC [30], which contains de-identified health data of intensive care unit (ICU) admissions. Our study leveraged three derived datasets from the MIMIC ecosystem: (1) MIMIC-IV [30] provides electronic health records (EHR) including demographic information and time-series physiological measurements per ICU stay; (2) MIMIC-CXR [31] contains sequential chest radiographs during ICU hospitalizations; and (3) Chest Ima Genome [32] augments imaging with fine-grained annotations: bounding boxes for anatomical regions and localized change labels (improved, worsened, or no change) between consecutive CXRs.
Dataset Splits	Yes	Our analysis focused on ICU stays containing at least two CXRs. The dataset was partitioned into training (70%), validation (10%), and test sets (20%) at the subject level to prevent data leakage.
Hardware Specification	Yes	The training and validation processes are executed on a server equipped with a RTX 3090-24GB GPU card and a 14 v CPU Intel(R) Xeon(R) Gold 6330 CPU.
Software Dependencies	Yes	The method is implemented using Py Torch 1.9.1 and Py Torch-Lightning 1.4.2 with CUDA 11.1 environment.
Experiment Setup	Yes	The model was trained with base batch sizes of 8 (for disease progression identification) and 4 (for general ICU prediction), using 4-step gradient accumulation to achieve an effective batch size of 32 or 16. Training proceeded for a maximum of 100 epochs with early stopping triggered after 10 epochs without validation improvement. Task-specific selection metrics were employed: macro-F1 for disease identification, accuracy for length-of-stay classification, and AUPRC for mortality prediction. The hyperparameter search spaces for each task are documented in Table 8. Adam W optimizer and Cosine Annealing LR learning rate schedular are used for training.