VQAttack: Transferable Adversarial Attacks on Visual Question Answering via Pre-trained Models

Authors: Ziyi Yin, Muchao Ye, Tianrong Zhang, Jiaqi Wang, Han Liu, Jinghui Chen, Ting Wang, Fenglong Ma

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results on two VQA datasets with five validated models demonstrate the effectiveness of the proposed VQATTACK in the transferable attack setting, compared with state-of-the-art baselines. This work reveals a significant blind spot in the pre-training & fine-tuning paradigm on VQA tasks. Experimental results verify the effectiveness of the proposed VQATTACK under the transferable attack setting.
Researcher Affiliation Academia 1The Pennsylvania State University 2Dalian University of Technology 3Stony Brook University
Pseudocode Yes Algorithm 1: The proposed VQATTACK
Open Source Code Yes The source code can be found in the link https://github.com/ericyinyzy/VQAttack.
Open Datasets Yes We evaluate the proposed VQATTACK on the VQAv2 (Antol et al. 2015) and Text VQA (Singh et al. 2019) datasets.
Dataset Splits No We randomly select 6,000 and 1,000 correctly predicted samples from the VQAv2 and Text VQA validation datasets, respectively. The paper uses existing validation sets but does not explicitly detail the overall train/validation/test splits used for model training or how the attack samples were selected from those splits beyond “randomly select... from the... validation datasets”.
Hardware Specification No The paper does not provide any specific hardware details such as GPU or CPU models used for running the experiments.
Software Dependencies No The paper mentions software like BERT model, Chat GPT, and Natural Language Toolkit (NLTK) but does not provide specific version numbers for any of these dependencies.
Experiment Setup No The paper mentions hyperparameters like step-size, perturbation budget for image and text, and total iterations as inputs to Algorithm 1, but does not provide their specific values used in the experiments within the main text.