G2D: From Global to Dense Radiography Representation Learning via Vision-Language Pre-training

Authors: Che Liu, Cheng Ouyang, Sibo Cheng, Anand Shah, Wenjia Bai, Rossella Arcucci

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct comprehensive experiments to validate the efficacy of the proposed G2D approach, which outperforms peer approaches across five uni-modal and cross-modal downstream tasks. In this section, we compare our approach with SOTA medical VLP techniques. The implementation details and dataset training/test splits are reported in Sec A.3, A.4.
Researcher Affiliation Collaboration Che Liu1,2, Cheng Ouyang3,8,9 Sibo Cheng10 Anand Shah6,7 Wenjia Bai2,3,4 Rossella Arcucci1,2 1Department of Earth Science and Engineering, Imperial College London, UK 2Data Science Institute, Imperial College London, UK 3 Department of Computing, Imperial College London, UK 4 Department of Brain Sciences, Imperial College London, UK 6 Department of Infectious Disease Epidemiology, Imperial College London, UK 7 Royal Brompton and Harefield Hospitals, UK 8 Department of Engineering Science, University of Oxford, Oxford, UK 9 Institute of Clinical Sciences, Imperial College London, UK 10 CEREA, École des Ponts and EDF R&D, Île-de-France, France.
Pseudocode No The paper describes the methodology using textual descriptions and diagrams (e.g., Figure 2) but does not include any explicitly labeled pseudocode blocks or algorithms.
Open Source Code Yes The code can be found at https://github.com/cheliucomputation/G2D-Neur IPS24/tree/main.
Open Datasets Yes We utilise the MIMIC-CXR dataset [29, 24]. For downstream tasks, our focus is to evaluate the efficacy of G2D in learning granular visual features that can be used for localisation, vision-language understanding, and visual recognition tasks. We examine the capability and transferability of the learned cross-modal representations by using them for five distinct medical imaging tasks, covering a spectrum of 25 different diseases.
Dataset Splits Yes Table 6: Details on Data Split: The symbol / denotes that training/validation data is not required for the zero-shot tasks. Task Dataset Split Train Valid Test Linear Classification Che Xpert [35] [35] 186,027 5,000 202
Hardware Specification Yes In line with [9, 8], G2D is pre-trained for 50 epochs across 16 A100 GPUs, each accommodating a batch size of 128. The downstream tasks are deployed on a 40G A100 GPU.
Software Dependencies No The paper mentions using the "Py Torch vision library" but does not specify a version number for PyTorch or the vision library itself. It also refers to "Clinical BERT" as the text encoder, but without a specific version.
Experiment Setup Yes G2D is pre-trained for 50 epochs across 16 A100 GPUs, each accommodating a batch size of 128. The Adam W optimizer is employed with a learning rate set to 2 10 4 and a weight decay of 1 10 8. Additionally, a linear warm-up and a cosine annealing scheduler are incorporated in the training process. For the SIIM [32] dataset, the default batch size is set at 8, while for the RSNA [31] dataset, it is set at 16.