Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
G2D: From Global to Dense Radiography Representation Learning via Vision-Language Pre-training
Authors: Che Liu, Cheng Ouyang, Sibo Cheng, Anand Shah, Wenjia Bai, Rossella Arcucci
NeurIPS 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct comprehensive experiments to validate the efficacy of the proposed G2D approach, which outperforms peer approaches across five uni-modal and cross-modal downstream tasks. In this section, we compare our approach with SOTA medical VLP techniques. The implementation details and dataset training/test splits are reported in Sec A.3, A.4. |
| Researcher Affiliation | Collaboration | Che Liu1,2, Cheng Ouyang3,8,9 Sibo Cheng10 Anand Shah6,7 Wenjia Bai2,3,4 Rossella Arcucci1,2 1Department of Earth Science and Engineering, Imperial College London, UK 2Data Science Institute, Imperial College London, UK 3 Department of Computing, Imperial College London, UK 4 Department of Brain Sciences, Imperial College London, UK 6 Department of Infectious Disease Epidemiology, Imperial College London, UK 7 Royal Brompton and Harefield Hospitals, UK 8 Department of Engineering Science, University of Oxford, Oxford, UK 9 Institute of Clinical Sciences, Imperial College London, UK 10 CEREA, รcole des Ponts and EDF R&D, รle-de-France, France. |
| Pseudocode | No | The paper describes the methodology using textual descriptions and diagrams (e.g., Figure 2) but does not include any explicitly labeled pseudocode blocks or algorithms. |
| Open Source Code | Yes | The code can be found at https://github.com/cheliucomputation/G2D-Neur IPS24/tree/main. |
| Open Datasets | Yes | We utilise the MIMIC-CXR dataset [29, 24]. For downstream tasks, our focus is to evaluate the efficacy of G2D in learning granular visual features that can be used for localisation, vision-language understanding, and visual recognition tasks. We examine the capability and transferability of the learned cross-modal representations by using them for five distinct medical imaging tasks, covering a spectrum of 25 different diseases. |
| Dataset Splits | Yes | Table 6: Details on Data Split: The symbol / denotes that training/validation data is not required for the zero-shot tasks. Task Dataset Split Train Valid Test Linear Classification Che Xpert [35] [35] 186,027 5,000 202 |
| Hardware Specification | Yes | In line with [9, 8], G2D is pre-trained for 50 epochs across 16 A100 GPUs, each accommodating a batch size of 128. The downstream tasks are deployed on a 40G A100 GPU. |
| Software Dependencies | No | The paper mentions using the "Py Torch vision library" but does not specify a version number for PyTorch or the vision library itself. It also refers to "Clinical BERT" as the text encoder, but without a specific version. |
| Experiment Setup | Yes | G2D is pre-trained for 50 epochs across 16 A100 GPUs, each accommodating a batch size of 128. The Adam W optimizer is employed with a learning rate set to 2 10 4 and a weight decay of 1 10 8. Additionally, a linear warm-up and a cosine annealing scheduler are incorporated in the training process. For the SIIM [32] dataset, the default batch size is set at 8, while for the RSNA [31] dataset, it is set at 16. |