Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Democratizing Clinical Risk Prediction with Cross-Cohort Cross-Modal Knowledge Transfer

Authors: Qiannan Zhang, Manqi Zhou, Zilong Bai, Chang Su, Fei Wang

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on real-world clinical data validate the effectiveness of our proposed model.
Researcher Affiliation	Academia	Qiannan Zhang, Manqi Zhou, Zilong Bai, Chang Su, Fei Wang Weill Cornell Medicine, Cornell University EMAIL
Pseudocode	Yes	Algorithm 1 The Training Procedure of C3M on the Source Cohort
Open Source Code	Yes	We release the code in https://github.com/graph-ehr/C3M.
Open Datasets	Yes	We leverage the national All of Us Research Platform [39] as the source cohort, and three target cohorts respectively from one local EHR data warehouse and two sub-networks (denoted as INSIGHT-A and INSIGHT-B) from the INSIGHT Clinical Research Network [1] to simulate our setting.
Dataset Splits	Yes	For datasets, the All of Us cohort is randomly split into training/validation/testing at a 6:2:2 ratio.
Hardware Specification	No	fine-tuning and evaluation on the target cohorts are both performed in CPU-only environments to assess practical deployment feasibility.
Software Dependencies	No	The paper mentions software components like 'Adam optimizer', 'GCN', 'Transformer encoder', 'MLP', and various hyperparameters, but does not provide specific version numbers for key software components (e.g., Python, PyTorch, CUDA).
Experiment Setup	Yes	To conduct graph-guided finetuning to obtain phenotypical representations, a two-layer GCN is adopted with 16 hidden units, along with a 16-dimensional embedding layer to represent 2634 medical concept nodes, and one transformation layer that transforms the foundation model output to initialize patient nodes. In addition, the transformer encoder consists of two layers with two heads, and we determine the expert number via search in {1,2,3,4}, while the gene decoder is an MLP with one hidden layer. Attention modulation is achieved using a multi-head attention mechanism with two heads. Both the teacher and student models are implemented as multi-layer perceptrons. The trade-off parameter β for gene feature reconstruction is selected via grid search over {0.01, 0.05, 0.1, 0.5, 1} and set as 0.1. The trade-off parameter λKD of knowledge distillation for the student model is set as 0.01 by grid search over {0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 1.0}. The learning rate of C3M and baseline models is selected from {0.01, 0.001, 0.0005}.