G2D: From Global to Dense Radiography Representation Learning via Vision-Language Pre-training
Authors: Che Liu, Cheng Ouyang, Sibo Cheng, Anand Shah, Wenjia Bai, Rossella Arcucci
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct comprehensive experiments to validate the efficacy of the proposed G2D approach, which outperforms peer approaches across five uni-modal and cross-modal downstream tasks. In this section, we compare our approach with SOTA medical VLP techniques. The implementation details and dataset training/test splits are reported in Sec A.3, A.4. |
| Researcher Affiliation | Collaboration | Che Liu1,2, Cheng Ouyang3,8,9 Sibo Cheng10 Anand Shah6,7 Wenjia Bai2,3,4 Rossella Arcucci1,2 1Department of Earth Science and Engineering, Imperial College London, UK 2Data Science Institute, Imperial College London, UK 3 Department of Computing, Imperial College London, UK 4 Department of Brain Sciences, Imperial College London, UK 6 Department of Infectious Disease Epidemiology, Imperial College London, UK 7 Royal Brompton and Harefield Hospitals, UK 8 Department of Engineering Science, University of Oxford, Oxford, UK 9 Institute of Clinical Sciences, Imperial College London, UK 10 CEREA, École des Ponts and EDF R&D, Île-de-France, France. |
| Pseudocode | No | The paper describes the methodology using textual descriptions and diagrams (e.g., Figure 2) but does not include any explicitly labeled pseudocode blocks or algorithms. |
| Open Source Code | Yes | The code can be found at https://github.com/cheliucomputation/G2D-Neur IPS24/tree/main. |
| Open Datasets | Yes | We utilise the MIMIC-CXR dataset [29, 24]. For downstream tasks, our focus is to evaluate the efficacy of G2D in learning granular visual features that can be used for localisation, vision-language understanding, and visual recognition tasks. We examine the capability and transferability of the learned cross-modal representations by using them for five distinct medical imaging tasks, covering a spectrum of 25 different diseases. |
| Dataset Splits | Yes | Table 6: Details on Data Split: The symbol / denotes that training/validation data is not required for the zero-shot tasks. Task Dataset Split Train Valid Test Linear Classification Che Xpert [35] [35] 186,027 5,000 202 |
| Hardware Specification | Yes | In line with [9, 8], G2D is pre-trained for 50 epochs across 16 A100 GPUs, each accommodating a batch size of 128. The downstream tasks are deployed on a 40G A100 GPU. |
| Software Dependencies | No | The paper mentions using the "Py Torch vision library" but does not specify a version number for PyTorch or the vision library itself. It also refers to "Clinical BERT" as the text encoder, but without a specific version. |
| Experiment Setup | Yes | G2D is pre-trained for 50 epochs across 16 A100 GPUs, each accommodating a batch size of 128. The Adam W optimizer is employed with a learning rate set to 2 10 4 and a weight decay of 1 10 8. Additionally, a linear warm-up and a cosine annealing scheduler are incorporated in the training process. For the SIIM [32] dataset, the default batch size is set at 8, while for the RSNA [31] dataset, it is set at 16. |