Multi-Granularity Cross-modal Alignment for Generalized Medical Visual Representation Learning
Authors: Fuying Wang, Yuyin Zhou, Shujun WANG, Varut Vardhanabhuti, Lequan Yu
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experimental results on seven downstream medical image datasets covering image classification, object detection, and semantic segmentation tasks demonstrate the stable and superior performance of our framework. |
| Researcher Affiliation | Academia | 1The University of Hong Kong 2University of California, Santa Cruz 3University of Cambridge {fuyingw@connect., varv@, lqyu@}hku.hk yzhou284@ucsc.edu sw991@cam.ac.uk |
| Pseudocode | No | The paper describes the modules and their mathematical formulations but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our code is in https://github.com/fuying-wang/MGCA. |
| Open Datasets | Yes | We pre-train our MGCA framework on the JPG version of MIMIC-CXR 2.0.0 dataset [31]. Medical Image Classification on three representative datasets: (1) Che Xpert [29], (2) RSNA Pneumonia [47]. (3) COVIDx [53]. Medical Object Detection on two object detection tasks: (1) RNSA Pneumonia [47], (2) Object CXR [26]. Medical Semantic Segmentation on SIIM and RNSA datasets: (1) SIIM Pneumothorax [18], (2) RNSA Pneumonia [47]. |
| Dataset Splits | Yes | Following [61], we hold out the expert-labeled validation set as test data and randomly select 5, 000 radiographs from training data for validation. Following [27], we manually split the dataset into training, validation, and test set with 70%/15%/15% ratio. We use the original validation dataset as test data and manually split 10% of original training set for validation. We randomly split the original training set into 16, 010/5, 337/5, 337 for training/validation/testing. We use the original development set as test set (1, 000) and randomly split the original training set into training (6, 400) and validation (1, 600) sets. train/validation/test split respectively constitutes 70%/30%/30% of original dataset. |
| Hardware Specification | Yes | We train our framework 50 epochs on 2 pieces of RTX 3090 GPUs with batch size of 144. |
| Software Dependencies | No | The paper mentions using Bio Clinical BERT [1] and Vi T-B/16 [12] but does not specify their version numbers or other software dependencies with specific versions. |
| Experiment Setup | Yes | The optimizer is Adam W [40] with learning rate of 2e 5 and weight decay of 0.05. We use a linear warmup with cosine annealing scheduler [39]. We initialize learning rate as 1e 8 and warmup epoch as 20. Following the practice in contrastive learning [4, 24], the dimension d = 128 and the temperature hyperparameters are τ1 = 0.1, τ2 = 0.07, τ3 = 0.2. The number of prototypes is K = 500. We set λ1 = 1, λ2 = 1, λ3 = 1. |