Visual Attention Prompted Prediction and Learning

Authors: Yifei Zhang, Bo Pan, Siyi Gu, Guangji Bai, Meikang Qiu, Xiaofeng Yang, Liang Zhao

IJCAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on four datasets demonstrate the effectiveness of our proposed framework in enhancing predictions for samples both with and without prompt.
Researcher Affiliation Academia 1Emory University 2Stanford University 3Augusta University {yifei.zhang2, bo.pan, guangji.bai, xyang43, liang.zhao}@emory.edu, sgu33@stanford.edu, qiumeikang@yahoo.com
Pseudocode Yes Algorithm 1 Alternating Training
Open Source Code Yes Code and tools are available at https://github.com/yifeizhangcs/ visual-attention-prompt
Open Datasets Yes We employed four datasets: two from real-world scenarios, sourced from MS COCO [Lin et al., 2014], and two from the medical field, namely LIDC-IDRI (LIDC) [Armato III et al., 2011] and the Pancreas dataset [Roth et al., 2015].
Dataset Splits Yes The final dataset included 2625 nodules and 65505 non-nodules images, split into 100/1200/1200 for training, validation, and testing to reflect limited access to human explanations. ... Data was split into 30/30/rest for training, validation, and testing, maintaining class balance.
Hardware Specification Yes Regarding computational resources, all experiments were executed using an NVIDIA GTX 3090 GPU.
Software Dependencies No The paper does not provide specific version numbers for software dependencies or libraries used (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes The experimental setup was standardized with a batch size of 16, and the number of perturbed masks was set to 5000. Furthermore, a pixel conversion probability of 0.1 was established. The training was conducted over 10 epochs, each comprising 5 iterations for the alternating updating phase, effectively resulting in 50 training epochs for each model. The Adam optimization algorithm [Kingma and Ba, 2014] was utilized with a learning rate of 0.0001.