Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
MAPLE: Multi-scale Attribute-enhanced Prompt Learning for Few-shot Whole Slide Image Classification
Authors: Junjie Zhou, WEI SHAO, Yagao Yue, Wei Mu, Peng Wan, Qi Zhu, Daoqiang Zhang
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct experiments on three cancer cohorts derived from the cancer genome atlas (TCGA), and the experimental results indicate the advantage of MAPLE on few-shot pathology diagnosis tasks. Codes will be available at https://github.com/JJ-ZHOU-Code/MAPLE. |
| Researcher Affiliation | Academia | 1The College of Artificial Intelligence, Nanjing University of Aeronautics and Astronautics 2The Key Laboratory of Brain-Machine Intelligence Technology, Ministry of Education 3The School of Engneering Medicine, Beihang University |
| Pseudocode | Yes | Algorithm 1 LLM-powered Prompt Construction |
| Open Source Code | No | Codes will be available at https://github.com/JJ-ZHOU-Code/MAPLE. The source code will be publicly available upon acceptance of the paper. |
| Open Datasets | Yes | Datasets. We evaluate MAPLE on three benchmark WSI datasets from The Cancer Genome Atlas (TCGA): TCGA-BRCA, TCGA-RCC, and TCGA-NSCLC. More details for the classification task on each cohort are provided in Appendix B.1. To simulate the few-shot learning scenario in clinical practice, we randomly sample K WSIs per class, where (K = 4, 8, 16 in our implementation). 2https://portal.gdc.cancer.gov |
| Dataset Splits | Yes | To simulate the few-shot learning scenario in clinical practice, we randomly sample K WSIs per class, where (K = 4, 8, 16 in our implementation). We conduct five-fold cross-validation and the mean and standard deviation are calculated according to the results of all folds. |
| Hardware Specification | Yes | All experiments are conducted using Py Torch 2.0.1 and CUDA 11.7 on Python 3.8 with NVIDIA RTX 3090 GPUs. |
| Software Dependencies | Yes | All experiments are conducted using Py Torch 2.0.1 and CUDA 11.7 on Python 3.8 with NVIDIA RTX 3090 GPUs. |
| Experiment Setup | Yes | GPT-4 [2] is taken as the frozen large language model (LLM). For hyperparameter settings, the the number of entities at each scale nk is tuned from 4 to 20 with an interval of 4 (Section 4.1), while the number of neighbors for constructing the cross-scale entity graph ne = 7 is tuned from 1 to 13 with interval 2 (Section 4.4). Since WSIs are with different numbers of divided patches, we select the top %r percentage tumor-related patches for each WSI as the the top-k patches, and we tune r from 0.1 to 1 with interval 0.2. Finally, the weighting parameter λ (Section 4.5) for combining entity-level and slide-level predictions is tuned from 0 to 1 with interval 0.1. The model is optimized using Adam W with a learning rate of 1 10 4 and trained for up to 80 epochs with early stopping based on validation performance. |