On Evaluating Adversarial Robustness of Large Vision-Language Models

Authors: Yunqing Zhao, Tianyu Pang, Chao Du, Xiao Yang, Chongxuan LI, Ngai-Man (Man) Cheung, Min Lin

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this work, we empirically evaluate the adversarial robustness of state-of-the-art large VLMs, particularly against those that accept visual inputs (e.g., image-grounded text generation or joint generation). To ensure reproducibility, our evaluations are all based on open-source large models. We examine the most realistic and high-risk scenario, in which adversaries have only black-box system access and seek to deceive the model into returning the targeted responses.
Researcher Affiliation Collaboration Yunqing Zhao 1, Tianyu Pang 2, Chao Du 2, Xiao Yang3, Chongxuan Li4, Ngai-Man Cheung 1, Min Lin2 1Singapore University of Technology and Design 2Sea AI Lab, Singapore 3Tsinghua University 4Renmin University of China
Pseudocode Yes Algorithm 1 Adversarial attack against large VLMs (Figure 4)
Open Source Code Yes Our project page: yunqing-me.github.io/Attack VLM/
Open Datasets Yes We use the validation images from Image Net-1K [20] as clean images, from which adversarial examples are crafted, to quantitatively evaluate the adversarial robustness of large VLMs. From MS-COCO captions [44], we randomly select a text description (usually a complete sentence, as shown in our Appendix) as the adversarially targeted text for each clean image.
Dataset Splits No The paper states that it uses "validation images from Image Net-1K" as clean images to be attacked, but it does not specify a training/validation/test split for its own experimental methodology or models. The attack itself does not have a traditional training phase with such splits.
Hardware Specification Yes Every experiment is run on a single NVIDIA-A100 GPU. ... NVIDIA A100 PCIe (40GB)
Software Dependencies No The paper mentions various models and frameworks (e.g., CLIP, BLIP, Uni Diffuser, GPT-4, PyTorch, Stable Diffusion, DALL-E, Midjourney, T5, Vicuna-13B) but does not provide specific version numbers for the software dependencies used in their implementation (e.g., Python, PyTorch, CUDA versions).
Experiment Setup Yes We set ϵ = 8 and use ℓ constraint by default as xcle xadv p ϵ = 8, which is the most commonly used setting in the adversarial literature [12], to ensure that the adversarial perturbations are visually imperceptible where the pixel values are in the range [0, 255]. We use 100-step PGD to optimize transfer-based attacks (the objectives in Eq. (1) and Eq. (2)). In each step of query-based attacks, we set query times N = 100 in Eq. (4) and update the adversarial images by 8-steps PGD using the estimated gradient.