Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

PathGen-1.6M: 1.6 Million Pathology Image-text Pairs Generation through Multi-agent Collaboration

Authors: Yuxuan Sun, Yunlong Zhang, Yixuan Si, Chenglu Zhu, Kai Zhang, Zhongyi Shui, Jingxiong Li, Xuan Gong, XINHENG LYU, Tao Lin, Lin Yang

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments show that integrating these generated pairs with existing datasets to train a pathology-specific CLIP model, Path Gen-CLIP, significantly enhances its ability to analyze pathological images, with substantial improvements across nine pathology-related zero-shot image classification tasks and three whole-slide image tasks.
Researcher Affiliation	Collaboration	Yuxuan Sun1,2 Yunlong Zhang1,2 Yixuan Si2 Chenglu Zhu2 Kai Zhang3 Zhongyi Shui1,2 Jingxiong Li1,2 Xuan Gong4 Xinheng Lyu2 Tao Lin2 Lin Yang2,5 1 Zhejiang University 2 Westlake University 3 Ohio State University 4 Harvard University 5 Center for Interdisciplinary Research and Innovation, Muyuan
Pseudocode	No	The paper describes the data construction pipeline and steps in prose and with figures (e.g., Figure 2 for the pipeline), but it does not contain explicit pseudocode blocks or algorithms labeled as such.
Open Source Code	Yes	Our dataset, code, and model are open-access at Path Gen-1.6M. The datasets proposed in this study are publicly released and maintained for long-term accessibility at Git Hub Path Gen-1.6M. Additionally, we have made Path Gen-CLIP, Path Gen-CLIP-L, and Path Gen LLa VA openly available, enabling the research community to fully reproduce our results.
Open Datasets	Yes	Our dataset, code, and model are open-access at Path Gen-1.6M. The Cancer Genome Atlas (TCGA) is a comprehensive, publicly funded project that provides clinical data across various cancer types. The datasets proposed in this study are publicly released and maintained for long-term accessibility at Git Hub Path Gen-1.6M. Table 13: Datasets used in our study and their corresponding source links Dataset Source Link Patch Camelyon17 https://patchcamelyon.grand-challenge.org/Download/ CRC-100K https://zenodo.org/records/1214456 SICAPv2 https://data.mendeley.com/datasets/9xxm58dvs3/1 BACH https://iciar2018-challenge.grand-challenge.org/Dataset/ Osteo https://journals.plos.org/plosone/article?id=10.1371/journal.pone. 0210706 Skin Cancer https://heidata.uni-heidelberg.de/dataset.xhtml?persistent Id=doi: 10.11588/data/7QCR8S MHIST https://bmirds.github.io/MHIST WSSS4LUAD https://wsss4luad.grand-challenge.org/ LC25000 (LC-Lung and LCColon) https: //github.com/tampapath/lung_colon_image_set?tab=readme-ov-file BRCAS https://www.bracs.icar.cnr.it/ Camelyon17 https://camelyon17.grand-challenge.org/Data/ Camelyon16 https://camelyon16.grand-challenge.org/Data/ Path MMU https://pathmmu-benchmark.github.io/#/
Dataset Splits	Yes	CAMELYON16 consists of 400 WSIs, with 270 assigned for training and 130 for testing. To enhance model validation, the training set is further divided into training and validation subsets in a 9:1 ratio... CAMELYON17 comprises 1,000 WSIs... 200 WSIs from the fourth and fifth hospitals are designated as the test set, while the remaining 300 WSIs are split into training and validation sets in a 9:1 ratio. BRACS (BRe Ast Carcinoma Subtyping) includes 547 WSIs... The dataset is officially split into 395 training images, 65 validation images, and 87 test images... For the linear probe experiment... We conduct the experiment... The procedure involves randomly selecting 256 samples from each class to form the training set.
Hardware Specification	Yes	We utilize 24 NVIDIA A100-80G GPUs for caption generation, 8 NVIDIA A100-80 G GPUs for training the Path Gen-LLa VA model, 4 NVIDIA A100-80G GPUs for fine-tuning LLa MA, and 4 NVIDIA A100-40 G GPUs for training and testing on downstream datasets.
Software Dependencies	Yes	For the CLIP training, we adhere to the open clip framework2 and use Open AI CLIP as initialization. For the training of Path Gen-LLa VA, we use our trained Path Gen-CLIP-L as the vision encoder and LLa VA-v1.5-13B (Liu et al., 2024) as the LLM component. We fully adhere to the training framework and parameters provided in LLa VA framework 3.
Experiment Setup	Yes	For the CLIP training, we adhere to the open clip framework and use Open AI CLIP as initialization. We use a learning rate of 3e-5 with an Adam optimizer that includes a weight decay of 0.1. We set a batch size of 96 across 4 NVIDIA A100 GPUs, resulting in an effective batch size of 384. In the first stage of training using Path Gen-1.6M, we limit the training to only one epoch. For the second stage of training with Path Geninit, we conduct two epochs. For the training of Path Gen-LLa VA... The training follows a two-stage process: in the first stage, we align the LLM with Path Gen-CLIP-L using the Path Geninit dataset, and in the second stage, we train using the Path Gen-Instruct-200K dataset... The linear probe experiment... batch size of 32 and run it for 20 epochs. The optimizer used is Adam W with a learning rate of 1 10 2... The models are trained for 50 epochs using a cosine learning rate decay schedule. The initial learning rates are determined through a grid search within the range [0.0001, 0.0002, 0.0005]... The training process utilizes the Adam optimizer with a weight decay of 0.0001, and the batch size is consistently set to 1. Otherwise, we set M = 5, K = 10, and p = 0.6 for ACMIL.