Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Multi-Modal Foundation Models for Computational Pathology: A Survey

Authors: Dong Li, Guihong Wan, Xintao Wu, Xinyu Wu, Xiaohui Chen, Yi He, Zhong Chen, Peter K Sorger, Chen Zhao

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this survey, we provide a comprehensive and up-to-date review of multi-modal foundation models in CPath, with a particular focus on models built upon hematoxylin and eosin (H&E) stained whole slide images (WSIs) and tile-level representations. We categorize 34 state-of-the-art multi-modal foundation models into three major paradigms: vision-language, vision-knowledge graph, and vision-gene expression. We further divide vision-language models into non-LLM-based and LLM-based approaches. Additionally, we analyze 30 available multi-modal datasets tailored for pathology, grouped into image-text pairs, instruction datasets, and image-other modality pairs. Our survey also presents a taxonomy of downstream tasks, highlights training and evaluation strategies, and identifies key challenges and future directions. ... Table 4 presents the F1-score and accuracy of eight MMFM4CPath models across nine datasets, namely WSSS4LUAD (Han et al., 2022), LC25000Lung (Borkowski et al., 2019), LC25000Colon (Borkowski et al., 2019), BACH (Aresta et al., 2019), CRC100K (Kather et al., 2018), Osteo (Arunachalam et al., 2019), SICAPv2 (Silva-Rodríguez et al., 2020), Pcam (Veeling et al., 2018), and Skin Cancer (Kriegsmann et al., 2022). The results indicate that even when evaluated on the same dataset and metric, different MMFM4CPath exhibit their own tendencies. Among them, CPath-Omni, Path Gen-CLIP, and KEEP demonstrate superior and more robust performance across datasets. Table 5 shows the performance of six MMFM4CPath models on the Arch-Pub Med (Gamper & Rajpoot, 2021b) and Arch-Book (Gamper & Rajpoot, 2021b) datasets for tile-to-caption (image-to-text) and caption-to-tile (text-to-image) retrieval, measured by Top-K Recall (R@K, K = {5, 10, 50}).
Researcher Affiliation	Academia	Dong Li EMAIL Department of Computer Science, Baylor University Guihong Wan EMAIL Department of Dermatology, Massachusetts General Hospital, Harvard Medical School Xintao Wu EMAIL Electrical Engineering and Computer Science Department, University of Arkansas Xinyu Wu EMAIL Department of Computer Science, Baylor University Xiaohui Chen EMAIL Department of Computer Science, Baylor University Yi He EMAIL Department of Data Science, The College of William and Mary Zhong Chen EMAIL School of Computing, Southern Illinois University Peter K. Sorger EMAIL Department of Systems Biology, Harvard Medical School Chen Zhao EMAIL Department of Computer Science, Baylor University
Pseudocode	No	The paper is a survey and describes methodologies of other papers in prose. There are no structured pseudocode or algorithm blocks presented for the survey itself.
Open Source Code	No	The paper is a survey and reviews existing multi-modal foundation models. It does not introduce a new methodology for which open-source code would be provided by the authors.
Open Datasets	Yes	We summarize existing multi-modal datasets for CPath, highlighting high-quality datasets or those that have demonstrated success in current models. Based on data types, we categorize them into three groups as shown in Table 3. ... Table 4 presents the F1-score and accuracy of eight MMFM4CPath models across nine datasets, namely WSSS4LUAD (Han et al., 2022), LC25000Lung (Borkowski et al., 2019), LC25000Colon (Borkowski et al., 2019), BACH (Aresta et al., 2019), CRC100K (Kather et al., 2018), Osteo (Arunachalam et al., 2019), SICAPv2 (Silva-Rodríguez et al., 2020), Pcam (Veeling et al., 2018), and Skin Cancer (Kriegsmann et al., 2022). Table 5 shows the performance of six MMFM4CPath models on the Arch-Pub Med (Gamper & Rajpoot, 2021b) and Arch-Book (Gamper & Rajpoot, 2021b) datasets...
Dataset Splits	No	The paper is a survey that compares results from other research. While it references several public datasets on which these models were evaluated (e.g., WSSS4LUAD, LC25000Lung, BACH), it does not specify any dataset splits used for its own comparative analysis or for reproducing its findings, as these details would belong to the original papers introducing and evaluating the models.
Hardware Specification	No	As a survey paper, this document reviews existing research and does not describe new experimental runs conducted by the authors. Therefore, it does not provide any hardware specifications for experimental replication.
Software Dependencies	No	This is a survey paper and does not present new experimental work or methodology requiring specific software dependencies for replication. Therefore, no software versions are mentioned.
Experiment Setup	No	Being a survey paper, this document synthesizes and analyzes existing research rather than conducting new experiments. Consequently, it does not provide details regarding its own experimental setup, hyperparameters, or system-level training settings.