Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Detecting Any instruction-to-answer interaction relationship:Universal Instruction-to-Answer Navigator for Med-VQA
Authors: Zhongze Wu, Hongyan Xu, Yitian Long, Shan You, Xiu Su, Jun Long, Yueyi Luo, Chang Xu
ICML 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 5. Experiment Results |
| Researcher Affiliation | Collaboration | 1Central South University, Changsha, Hunan, China 2University of New South Wales, Sydney, Australia 3Vanderbilt University, Nashville, Tennessee, USA 4Sense Time 5University of Sydney, Sydney, Australia. |
| Pseudocode | Yes | Algorithm 1 Token-Level Cut-Mix (TC-Mix) |
| Open Source Code | No | The paper states 'we have made the IAI-Med VQA dataset publicly available', but does not provide concrete access to the source code for the Uni-Med framework itself. |
| Open Datasets | Yes | We use the PMC-VQA dataset (Zhang et al., 2023b), which includes 227K VQA pairs from 149K images... For fine-tuning, we used two medical datasets: VQA-RAD (Nguyen et al., 2019a)... and SLAKE (Liu et al., 2021b)... |
| Dataset Splits | No | For fine-tuning, we used two medical datasets: VQA-RAD (Nguyen et al., 2019a), consisting of 314 radiology images and 3,064 clinician-curated questionand-answer pairs; and SLAKE (Liu et al., 2021b), which offers 642 radiology images and 14K question-and-answer samples, of which we used 70% for training and 30% for testing. |
| Hardware Specification | Yes | The model is trained using the Adam W optimizer, combined with a cosine learning rate scheduler, across 8 Tesla V100 GPUs over 8,000 steps. |
| Software Dependencies | No | The paper mentions 'Adam W optimizer' and 'cosine learning rate scheduler' but does not specify software dependencies with version numbers. |
| Experiment Setup | Yes | We set a global batch size of 128 and a peak learning rate of 2e-5 to optimize performance. |