MEDICAL IMAGE UNDERSTANDING WITH PRETRAINED VISION LANGUAGE MODELS: A COMPREHENSIVE STUDY
Authors: Ziyuan Qin, Huahui Yi, Qicheng Lao, Kang Li
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct extensive experiments on thirteen different medical datasets across various modalities, showing that our well-designed prompts greatly improve the zero-shot performance compared to the default prompts, and our fine-tuned models surpass the supervised models by a significant margin. |
| Researcher Affiliation | Academia | Ziyuan Qin1 Huahui Yi1 Qicheng Lao2,4 Kang Li1,3,4 1West China Biomedical Big Data Center, West China Hospital, Sichuan University 2School of Artificial Intelligence, BUPT 3Sichuan University Pittsburgh Institute 4Shanghai AI-Lab |
| Pseudocode | No | The paper does not contain any structured pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | Yes | Code and more information could be found at https://github.com/MembrLab/MIU-VL |
| Open Datasets | Yes | For non-radiology images... The ISIC-16 dataset consists of 1,279 images with 1,282 bboxes... divided into 720/180/379 images for training, validation, and testing. The DFUC2020 dataset... divided into 1,280/320/400 images for training, validation, and testing... The BCCD dataset... split into training, validation, and test sets with 765, 73, and 36 images, respectively. |
| Dataset Splits | Yes | The ISIC-16 dataset consists of 1,279 images with 1,282 bboxes... divided into 720/180/379 images for training, validation, and testing. The DFUC2020 dataset... divided into 1,280/320/400 images for training, validation, and testing... The BCCD dataset... split into training, validation, and test sets with 765, 73, and 36 images, respectively. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU or CPU models, memory, or processor types used for running the experiments. It only mentions using a 'visual backbone' and 'linguistic backbone'. |
| Software Dependencies | No | The paper mentions software components like "Pubmed Bert-base-uncased variant", "OFA-base variant", and "MMDetection framework". However, it does not provide specific version numbers for these components as required for reproducibility. |
| Experiment Setup | Yes | We train our models using Adam optimizer with base learning rate of 1 × 10−4 (1 × 10−5 for the BERT text encoder), and the weight decay is set to 0.05. We freeze the bottom two layers of the image encoder and decay the learning rate by 0.1 when the validation performance plateaus. |