A case for reframing automated medical image classification as segmentation
Authors: Sarah Hooper, Mayee Chen, Khaled Saab, Kush Bhatia, Curtis Langlotz, Christopher Ré
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We then implement methods for using segmentation models to classify medical images, which we call segmentation-for-classification, and compare these methods against traditional classification on three retrospective datasets (n=2,018 19,237). |
| Researcher Affiliation | Academia | Sarah M. Hooper Electrical Engineering Stanford University Mayee F. Chen Computer Science Stanford University Khaled Saab Electrical Engineering Stanford University Kush Bhatia Computer Science Stanford University Curtis Langlotz Radiology and Biomedical Data Science Stanford University Christopher Ré Computer Science Stanford University |
| Pseudocode | Yes | Specifically, in Algorithm 1, we give the algorithm we use to compute a binary, image-level label from a probabilistic segmentation mask. In Algorithm 2 we provide the method we use to compute a probabilistic image-level label from a probabilistic segmentation mask. |
| Open Source Code | No | The paper does not provide an explicit statement or a direct link to a source-code repository for the methodology described. |
| Open Datasets | Yes | We also evaluate three medical imaging datasets: CANDID, in which we aim to classify pneumothorax in chest x-rays (n=19,237) [29]; ISIC, in which we aim to classify melanoma from skin lesion photographs (n=2,750) [30]; and SPINE, in which we aim to classify cervical fractures in CT scans (n=2,018, RSNA 2022 Cervical Spine Fracture Detection Challenge). |
| Dataset Splits | Yes | We split this dataset randomly into 60% training images, 20% validation images, and 20% test images. We use the splits provided by the ISIC challenge organizers, resulting in 2000 training images, 150 validation images, and 600 test images. We randomly split this dataset into 60% training, 20% validation, and 20% test. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., specific GPU or CPU models, memory) used for running the experiments. |
| Software Dependencies | Yes | We conduct our experiments using Pytorch Lightning (Pytorch version 1.9.0, Lightning version 1.5.10) [60, 61]. |
| Experiment Setup | Yes | We provide additional training details and information on hyperparameter tuning in Appendix A4.2 and A5.2... We train each network with an Adam optimizer and a learning rate of 1e-4, tuned from [1e-6, 1e-5, 1e-4, 1e-3]. We evaluated learning rates [1e-6, 1e-5, 1e-4, 1e-3, 1e-2] for each summarizing function and chose the learning rate that maximized the validation AUROC. |