Detecting Any instruction-to-answer interaction relationship:Universal Instruction-to-Answer Navigator for Med-VQA

Authors: Zhongze Wu, Hongyan Xu, Yitian Long, Shan You, Xiu Su, Jun Long, Yueyi Luo, Chang Xu

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 5. Experiment Results
Researcher Affiliation Collaboration 1Central South University, Changsha, Hunan, China 2University of New South Wales, Sydney, Australia 3Vanderbilt University, Nashville, Tennessee, USA 4Sense Time 5University of Sydney, Sydney, Australia.
Pseudocode Yes Algorithm 1 Token-Level Cut-Mix (TC-Mix)
Open Source Code No The paper states 'we have made the IAI-Med VQA dataset publicly available', but does not provide concrete access to the source code for the Uni-Med framework itself.
Open Datasets Yes We use the PMC-VQA dataset (Zhang et al., 2023b), which includes 227K VQA pairs from 149K images... For fine-tuning, we used two medical datasets: VQA-RAD (Nguyen et al., 2019a)... and SLAKE (Liu et al., 2021b)...
Dataset Splits No For fine-tuning, we used two medical datasets: VQA-RAD (Nguyen et al., 2019a), consisting of 314 radiology images and 3,064 clinician-curated questionand-answer pairs; and SLAKE (Liu et al., 2021b), which offers 642 radiology images and 14K question-and-answer samples, of which we used 70% for training and 30% for testing.
Hardware Specification Yes The model is trained using the Adam W optimizer, combined with a cosine learning rate scheduler, across 8 Tesla V100 GPUs over 8,000 steps.
Software Dependencies No The paper mentions 'Adam W optimizer' and 'cosine learning rate scheduler' but does not specify software dependencies with version numbers.
Experiment Setup Yes We set a global batch size of 128 and a peak learning rate of 2e-5 to optimize performance.