xMIL: Insightful Explanations for Multiple Instance Learning in Histopathology

Authors: Julius Hense, Mina Jamshidi Idaji, Oliver Eberle, Thomas Schnake, Jonas Dippel, Laure Ciernik, Oliver Buchstab, Andreas Mock, Frederick Klauschen, Klaus-Robert Müller

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate how to obtain improved MIL explanations using layer-wise relevance propagation (LRP) and conduct extensive evaluation experiments on three toy settings and four real-world histopathology datasets. Our approach consistently outperforms previous explanation attempts with particularly improved faithfulness scores on challenging biomarker prediction tasks.
Researcher Affiliation Collaboration Julius Hense1,2, , Mina Jamshidi Idaji1,2, , Oliver Eberle1,2 Thomas Schnake1,2 Jonas Dippel1,2,3 Laure Ciernik1,2 Oliver Buchstab4 Andreas Mock4,5 Frederick Klauschen1,4,5,6 Klaus-Robert Müller1,2,7,8, 1Berlin Institute for the Foundations of Learning and Data, Berlin, Germany 2Machine Learning Group, Technische Universität Berlin, Berlin, Germany 3Aignostics Gmb H, Berlin, Germany 4Institute of Pathology, Ludwig Maximilian University, Munich, Germany 5German Cancer Research Center, Heidelberg, and German Cancer Consortium, Munich, Germany 6Institute of Pathology, Charité Universitätsmedizin, Berlin, Germany 7Department of Artificial Intelligence, Korea University, Seoul, Korea 8Max-Planck Institute for Informatics, Saarbrücken, Germany
Pseudocode No The paper describes methods and processes in detailed text and figures (e.g., Figure 2, Section 3.2) but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code Yes Codes are available at: https://github.com/bifold-pathomics/xMIL.
Open Datasets Yes To evaluate the performance of explanations on real-world histopathology prediction tasks, we considered four diverse datasets of increasing task difficulty covering tumor detection, disease subtyping, and biomarker prediction. These datasets had previously been used for benchmarking in multiple studies [12, 33, 46, 69]. ... We downloaded TCGA HNSC, LUAD, and LUSC datasets from TCGA website. The HPV status of HNSC dataset and the TP53 mutations of LUAD dataset were downloaded from c Bio Portal [80, 81, 82].
Dataset Splits Yes CAMELYON16: We used the pre-defined test set of 130 slides, and randomly split the remaining slides into 230 for training and 40 for validation. NSCLC: As in previous works [4, 33], we randomly split the slides into 60% training, 15% validation, and 25% test data. HNSC HPV: Due to the low number of HPV-positive samples, we uniformly split the dataset into three cross-validation folds like in previous work [12]. LUAD TP53: We randomly split the slides into 60% training, 15% validation, and 25% test data.
Hardware Specification Yes The training was done on an A100 80GB GPU.
Software Dependencies No The paper mentions using 'Torch Vision library [79]' and 'Captum [77]' but does not provide specific version numbers for these software dependencies, which would be necessary for full reproducibility.
Experiment Setup Yes Attn MIL models were trained for up to 1,000 epochs with batch size 32, and the Trans MIL models for up to 200 epochs with batch size 5. We selected the checkpoint with the highest validation AUC. ... For Attn MIL, we found that the best configuration was always a learning rate of 0.002 and no dropout. For Trans MIL, we ended up with a learning rate of 0.0002 and high dropout (0.2 after the feature extractor, 0.5 after the self-attention blocks and before the final classification layer) for CAMELYON16 and NSCLC, and a learning rate of 0.002 without dropout for HNSC HPV and LUAD TP53.