Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Democratizing Clinical Risk Prediction with Cross-Cohort Cross-Modal Knowledge Transfer

Authors: Qiannan Zhang, Manqi Zhou, Zilong Bai, Chang Su, Fei Wang

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on real-world clinical data validate the effectiveness of our proposed model.
Researcher Affiliation Academia Qiannan Zhang, Manqi Zhou, Zilong Bai, Chang Su, Fei Wang Weill Cornell Medicine, Cornell University EMAIL
Pseudocode Yes Algorithm 1 The Training Procedure of C3M on the Source Cohort
Open Source Code Yes We release the code in https://github.com/graph-ehr/C3M.
Open Datasets Yes We leverage the national All of Us Research Platform [39] as the source cohort, and three target cohorts respectively from one local EHR data warehouse and two sub-networks (denoted as INSIGHT-A and INSIGHT-B) from the INSIGHT Clinical Research Network [1] to simulate our setting.
Dataset Splits Yes For datasets, the All of Us cohort is randomly split into training/validation/testing at a 6:2:2 ratio.
Hardware Specification No fine-tuning and evaluation on the target cohorts are both performed in CPU-only environments to assess practical deployment feasibility.
Software Dependencies No The paper mentions software components like 'Adam optimizer', 'GCN', 'Transformer encoder', 'MLP', and various hyperparameters, but does not provide specific version numbers for key software components (e.g., Python, PyTorch, CUDA).
Experiment Setup Yes To conduct graph-guided finetuning to obtain phenotypical representations, a two-layer GCN is adopted with 16 hidden units, along with a 16-dimensional embedding layer to represent 2634 medical concept nodes, and one transformation layer that transforms the foundation model output to initialize patient nodes. In addition, the transformer encoder consists of two layers with two heads, and we determine the expert number via search in {1,2,3,4}, while the gene decoder is an MLP with one hidden layer. Attention modulation is achieved using a multi-head attention mechanism with two heads. Both the teacher and student models are implemented as multi-layer perceptrons. The trade-off parameter β for gene feature reconstruction is selected via grid search over {0.01, 0.05, 0.1, 0.5, 1} and set as 0.1. The trade-off parameter λKD of knowledge distillation for the student model is set as 0.01 by grid search over {0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 1.0}. The learning rate of C3M and baseline models is selected from {0.01, 0.001, 0.0005}.