Multi-Granularity Cross-modal Alignment for Generalized Medical Visual Representation Learning

Authors: Fuying Wang, Yuyin Zhou, Shujun WANG, Varut Vardhanabhuti, Lequan Yu

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experimental results on seven downstream medical image datasets covering image classification, object detection, and semantic segmentation tasks demonstrate the stable and superior performance of our framework.
Researcher Affiliation Academia 1The University of Hong Kong 2University of California, Santa Cruz 3University of Cambridge {fuyingw@connect., varv@, lqyu@}hku.hk yzhou284@ucsc.edu sw991@cam.ac.uk
Pseudocode No The paper describes the modules and their mathematical formulations but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes Our code is in https://github.com/fuying-wang/MGCA.
Open Datasets Yes We pre-train our MGCA framework on the JPG version of MIMIC-CXR 2.0.0 dataset [31]. Medical Image Classification on three representative datasets: (1) Che Xpert [29], (2) RSNA Pneumonia [47]. (3) COVIDx [53]. Medical Object Detection on two object detection tasks: (1) RNSA Pneumonia [47], (2) Object CXR [26]. Medical Semantic Segmentation on SIIM and RNSA datasets: (1) SIIM Pneumothorax [18], (2) RNSA Pneumonia [47].
Dataset Splits Yes Following [61], we hold out the expert-labeled validation set as test data and randomly select 5, 000 radiographs from training data for validation. Following [27], we manually split the dataset into training, validation, and test set with 70%/15%/15% ratio. We use the original validation dataset as test data and manually split 10% of original training set for validation. We randomly split the original training set into 16, 010/5, 337/5, 337 for training/validation/testing. We use the original development set as test set (1, 000) and randomly split the original training set into training (6, 400) and validation (1, 600) sets. train/validation/test split respectively constitutes 70%/30%/30% of original dataset.
Hardware Specification Yes We train our framework 50 epochs on 2 pieces of RTX 3090 GPUs with batch size of 144.
Software Dependencies No The paper mentions using Bio Clinical BERT [1] and Vi T-B/16 [12] but does not specify their version numbers or other software dependencies with specific versions.
Experiment Setup Yes The optimizer is Adam W [40] with learning rate of 2e 5 and weight decay of 0.05. We use a linear warmup with cosine annealing scheduler [39]. We initialize learning rate as 1e 8 and warmup epoch as 20. Following the practice in contrastive learning [4, 24], the dimension d = 128 and the temperature hyperparameters are τ1 = 0.1, τ2 = 0.07, τ3 = 0.2. The number of prototypes is K = 500. We set λ1 = 1, λ2 = 1, λ3 = 1.