Evidential Interactive Learning for Medical Image Captioning

Authors: Ervine Zheng, Qi Yu

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on two medical image datasets illustrate that the proposed framework can effectively learn from human feedback and improve performance in the future.
Researcher Affiliation Academia Ervine Zheng 1 Qi Yu 1 1Rochester Institute of Technology. Correspondence to: Qi Yu <qi.yu@rit.edu>.
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code Yes The source code is provided at https://github.com/ritmininglab/EIL-MIC
Open Datasets Yes We evaluate the proposed method on medical image captioning datasets. The IU X-RAY (Young et al., 2014) dataset includes a collection of radiology examinations, including images and narrative reports by radiologists. The PEIR Gross (Library, 2022) dataset is released by the Pathology Education Informational Resource digital library and includes teaching images of gross lesions along with their associated captions.
Dataset Splits Yes We use five-fold cross-validation for hyperparameter tuning.
Hardware Specification Yes For experiments, the proposed method and baselines are trained with Intel Core i7-3820 CPU and NVIDIA Ge Force RTX2070 GPU.
Software Dependencies No The paper mentions the use of 'Efficient Net' and 'Adam optimizer' but does not specify version numbers for general software dependencies like Python, PyTorch, or other relevant libraries.
Experiment Setup Yes The embedding dimension is tuned via grid search and set to 512, and the number of attention heads is tuned and set to 2. The number of stacked transformer blocks for keyword prediction and caption generation is tuned and set to 4. λ is set to 1. We use stochastic gradient descent and Adam optimizer with a learning rate scheduled from 5e-5 to 1e-5.