Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

ExGra-Med: Extended Context Graph Alignment for Medical Vision-Language Models

Authors: Duy M. H. Nguyen, Nghiem Diep, Trung Nguyen, Hoang-Bao Le, Tai D. Nguyen, Anh-Tien Nguyen, TrungTin Nguyen, Nhat Ho, Pengtao Xie, Roger Wattenhofer, Daniel Sonntag, James Y Zou, Mathias Niepert

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirically, EXGRA-MED matches LLAVA-MED s performance using just 10% of pre-training data, achieving a 20.13% gain on VQA-RAD and approaching full-data performance. It also outperforms strong baselines like BIOMEDGPT and RADFM on visual chatbot and zero-shot classification tasks, demonstrating its promise for efficient, high-quality vision-language integration in medical AI.
Researcher Affiliation	Academia	1 German Research Centre for Artificial Intelligence (DFKI), 2 Max Planck Research School for Intelligent Systems (IMPRS-IS), 3 University of Stuttgart, 4 University Medical Center Gottingen, 5 Max Planck Institute for Multidisciplinary Sciences, 6 ARC Centre of Excellence for the Mathematical Analysis of Cellular Systems, 7 School of Mathematical Sciences, Queensland University of Technology, 8 University of Oldenburg, 9 University of Texas at Austin, 10 University of California San Diego, 11 MBZUAI, 12 ETH Zurich, 13 Stanford University
Pseudocode	No	The paper includes block diagrams (e.g., Figure 3: Overview of EXGRA-MED) and mathematical formulations, but it does not contain explicit pseudocode or algorithm blocks with structured, step-by-step instructions in a code-like format.
Open Source Code	No	We will release our Git Hub implementation if the paper is accepted.
Open Datasets	Yes	Pre-training data. We use the same dataset as LLa VA-Med [50]. Stage 1 includes 600K image-text pairs filtered from PMC-15M [106]... We test pre-trained models on three prominent biomedical VQA datasets: VQA-RAD [48], SLAKE [55], and Path VQA [31]... We assess the generalization of EXGRA-MED on zero-shot image classification by adapting public datasets from [34].
Dataset Splits	Yes	We pre-trained LLa VA-Med on varying data amounts (10%, 40%, 70%) and finetuned it on the VQA-RAD dataset... The dataset statistics are summarized in detail in Table 14. Dataset VQA-RAD SLAKE Path VQA Train Test Train Val Test Train Val Test # Images 313 203 450 96 96 2599 858 858 # QA Pairs 1797 451 4919 1053 1061 19755 6279 6761
Hardware Specification	Yes	We train EXGRA-MED using 4 A100-GPUs per with 80GB for both stages and complete the training process for stage 1 in 6.5 hours and for stage 2 in 7.5 hours.
Software Dependencies	No	We use the LLa MA-7B large language model [93], the CLIP-Vi T-L-Patch14 visual encoder [77], and an MLP projection similar to LLa VA 1.5 [56]. The model is optimized using Adam [47] with Cosine Annealing LR scheduler and learning rates of 2e 3 and 2e 5 for stages 1 and 2, respectively.
Experiment Setup	Yes	We use the LLa MA-7B large language model [93], the CLIP-Vi T-L-Patch14 visual encoder [77], and an MLP projection similar to LLa VA 1.5 [56]. Stage 1 follows the standard LLa VA-Med [50] setup, while stage 2 incorporates our multi-graph alignment with autoregressive training. For multi-graph alignment, a 2-layer graph convolutional network is applied to the output of the Projection and LLM Decoder (handling both image and text modalities). We train for 1 epoch in stage 1 and 3 epochs in stage 2... The model is optimized using Adam [47] with Cosine Annealing LR scheduler and learning rates of 2e 3 and 2e 5 for stages 1 and 2, respectively.