Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Sequential Attention-based Sampling for Histopathological Analysis

Authors: Tarun Gogisetty, Naman Malpani, Gugan Chandrashekhar Mallika Thoppe, Sridharan Devarajan

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We show that SASHA matches state-of-the-art methods that analyze the WSI fully at high resolution, albeit at a fraction of their computational and memory costs. In addition, it significantly outperforms competing, sparse sampling methods. We propose SASHA as an intelligent sampling model for medical imaging challenges that involve automated diagnosis with exceptionally large images containing sparsely informative features1. 1 Introduction Deep learning models have emerged as a major frontier in automated medical imaging diagnosis. For example, such models accurately identify tumorous tissue in whole-slide histopathological images (WSI) [14, 10, 12, 19]. 4 Experiments 4.1 Dataset and implementation details We evaluate the effectiveness of SASHA with two publicly available benchmark datasets CAMELYON16 breast cancer [3] and TCGA-NSCLC lung cancer dataset [26]. While the CAMELYON16 labels correspond to normal versus pathological WSIs, TCGA-NSCLC labels correspond to distinguishing two different types of cancers, adenocarcinoma versus squamous cell carcinoma. We followed standardized train-test splits for both datasets. In addition, we also evaluate SASHA on multi-class classification tasks using the BRACS and Camelyon+ datasets. Further details on the datasets, hyperparameter selection, implementation, and training are in Appendices A.1 and A.2. Table 1: Performance comparison on the CAMELYON16 and TCGA-NSCLC datasets. The top 7 rows reflect models that process all patches at high resolution ( Sample 100% ). Underlined values indicate the best performance among these methods. Rows 9-11 and 12-14 reflect models that selectively sample either 10% or 20% of the patches, respectively. Bold values indicate the best performance among these methods at the respective sampling fraction. Table 2: Ablation study on CAMELYON16 with the SASHA-0.2 setup using 20% high-res patches. denotes the default, * the altered component.
Researcher Affiliation	Academia	Tarun G Indian Institute of Science, Bangalore, India EMAIL Naman Malpani Indian Institute of Science, Bangalore, India EMAIL Gugan Thoppe Indian Institute of Science, Bangalore, India EMAIL Devarajan Sridharan Indian Institute of Science, Bangalore, India EMAIL
Pseudocode	Yes	Algorithm 1 Agent-Environment Loop at Time Step t Input: Current state St RN d; Set I of sampled patch indices, upto time t Parameters: Policy network πθt; HAFED feature aggregator model f H(.; θH) and classifier f C(.; θC); TSU state update model f S(.; θS) Output: Updated state St+1 and action at+1 at time t + 1 1: at πθt(. \| St) {Sample patch index at from the policy distribution} 2: I I {at} {Store selected patch index} 3: if training then 4: ˆyt f C(St; θC) {predict WSI label with HAFED classifier} 5: Update πθ with a PPO algorithm; intermediate reward rt = CE(y, ˆyt) 6: end if 7: V (at) f H(Uat; θH) {extract features V (at) Rd from high-resolution zoomed in features, Uat Rk d of the sampled patch, with the HAFED model} 8: St+1(at) V (at) {update the sampled patch with its high-resolution feature representation} 9: C {i : cos (St(i), St(at)) τ} {identify set of patch indices whose cosine similarity with St(at) exceeds a threshold} 10: for each patch index i in C \ I do 11: St+1(i) f S([St(i), St(at), V (at)]; θS) {targeted update of similar patches with the TSU model} 12: end for 13: return St+1, at+1
Open Source Code	Yes	1Model implementation is available at: https://github.com/coglabiisc/SASHA
Open Datasets	Yes	We evaluate the effectiveness of SASHA with two publicly available benchmark datasets CAMELYON16 breast cancer [3] and TCGA-NSCLC lung cancer dataset [26]. While the CAMELYON16 labels correspond to normal versus pathological WSIs, TCGA-NSCLC labels correspond to distinguishing two different types of cancers, adenocarcinoma versus squamous cell carcinoma. We followed standardized train-test splits for both datasets. In addition, we also evaluate SASHA on multi-class classification tasks using the BRACS and Camelyon+ datasets. Further details on the datasets, hyperparameter selection, implementation, and training are in Appendices A.1 and A.2.
Dataset Splits	Yes	We followed standardized train-test splits for both datasets. In addition, we also evaluate SASHA on multi-class classification tasks using the BRACS and Camelyon+ datasets. Further details on the datasets, hyperparameter selection, implementation, and training are in Appendices A.1 and A.2. A.1 Dataset details The CAMELYON16 dataset comprises 399 whole-slide images (WSIs) of lymph node sections. After pre-processing and segmentation, we generated non-overlapping patches of size 256 256 ( 3 RGB channels). On average, this resulted in, on average, 800 patches per WSI at a scanning resolution of 5 , and 12,500 patches at a high resolution of 20 . We used the official test set for evaluation [3]. The training set was further divided to create a validation set (10%) while maintaining a similar proportion of normal and tumor slides in each subset. The TCGA-NSCLC dataset comprises 1,054 whole-slide images (WSIs) obtained from the TCGANSCLC repository under the TCGA-LUSC and TCGA-LUAD projects. Ten low-quality slides without magnification information were discarded, and three slides with errors were ignored. We followed the same test split as in DSMIL [18]. The test set comprised a total of 213 slides: 104 from TCGA-LUAD and 109 from TCGA-LUSC. Following preprocessing, we obtained, on average, 212 patches per WSI at 5 magnification and 3,393 patches at 20 magnification. The training set was further divided to create a validation (20%) set while maintaining a similar proportion of labels in each subset. Table 3: Summary of datasets used in this study. CAMELYON16 involves binary classification of normal versus tumor slides, while TCGA-NSCLC involves distinguishing between adenocarcinoma (LUAD) and squamous cell carcinoma (LUSC). The table shows the number of WSIs for each class in the training, validation, and test sets. Here, Class 1 refers to Normal (for CAMELYON16) and LUAD (for TCGA-NSCLC), while Class 2 refers to Tumor (for CAMELYON16) and LUSC (for TCGA-NSCLC). Table 4: Summary of Camelyon+ dataset. The WSIs have the following 4-class slide-level labels: Negative (Neg.), Micro-metastasis (Mi-m.), Macro-metastasis (Ma-m.), and Isolated Tumor Cells (ITC). The table shows the number of WSIs for each class in the training, validation, and test sets. Table 5: BRACS dataset distribution for 3-class classification (Benign, Atypical, Malignant) across training, validation, and test splits.
Hardware Specification	Yes	A.5 Hardware For feature extraction and model inference, we utilized an NVIDIA Ge Force GTX 1080 Ti GPU with 12 GB of memory. Feature extraction was carried out with a batch size of 512. Training for HAFED and TSU was also performed on the same GPU. To accelerate the training of the RL agent described in Section 3.4, we employed an NVIDIA Tesla V-100 GPU with 32 GB of memory.
Software Dependencies	No	The paper mentions using a 'Vision Transformer (Vi T) pretrained on histopathological images' and the 'Adam W optimizer', and also that 'operations were performed using the code provided as part of the CLAM [22] Git Hub repository'. However, it does not specify concrete version numbers for general software dependencies like Python, PyTorch, or TensorFlow, nor for CLAM or the ViT implementation itself.
Experiment Setup	Yes	A.2 Implementation and Hyper-parameter Selection A.2.1 Patch feature distillation with HAFED ...HAFED is trained with an annealed learning rate schedule starting at 4 10 4, with the Adam W optimizer and a weight decay of 1 10 4. ... A.2.2 Updating WSI state with TSU ...τ was tuned as a hyperparameter (Table 7) ... The TSU model is trained for 500 epochs with an Adam optimizer and a learning rate of 1 10 4. A.2.3 Actor and Critic ...Both networks are trained for 15 epochs using the Adam W optimizer with an annealing learning rate starting from 1 10 5 and a weight decay of 1 10 3. Network hyperparameters are reported in Table 7. Table 7: Hyperparameters for the CAMELYON16 and TCGA-NSCLC datasets. In the Values column, entries separated by a slash (/) represent values for SASHA-0.1 and SASHA-0.2 models respectively; a singleton entry reflects the same value across both SASHA variants.