On Evaluating Adversarial Robustness of Large Vision-Language Models
Authors: Yunqing Zhao, Tianyu Pang, Chao Du, Xiao Yang, Chongxuan LI, Ngai-Man (Man) Cheung, Min Lin
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this work, we empirically evaluate the adversarial robustness of state-of-the-art large VLMs, particularly against those that accept visual inputs (e.g., image-grounded text generation or joint generation). To ensure reproducibility, our evaluations are all based on open-source large models. We examine the most realistic and high-risk scenario, in which adversaries have only black-box system access and seek to deceive the model into returning the targeted responses. |
| Researcher Affiliation | Collaboration | Yunqing Zhao 1, Tianyu Pang 2, Chao Du 2, Xiao Yang3, Chongxuan Li4, Ngai-Man Cheung 1, Min Lin2 1Singapore University of Technology and Design 2Sea AI Lab, Singapore 3Tsinghua University 4Renmin University of China |
| Pseudocode | Yes | Algorithm 1 Adversarial attack against large VLMs (Figure 4) |
| Open Source Code | Yes | Our project page: yunqing-me.github.io/Attack VLM/ |
| Open Datasets | Yes | We use the validation images from Image Net-1K [20] as clean images, from which adversarial examples are crafted, to quantitatively evaluate the adversarial robustness of large VLMs. From MS-COCO captions [44], we randomly select a text description (usually a complete sentence, as shown in our Appendix) as the adversarially targeted text for each clean image. |
| Dataset Splits | No | The paper states that it uses "validation images from Image Net-1K" as clean images to be attacked, but it does not specify a training/validation/test split for its own experimental methodology or models. The attack itself does not have a traditional training phase with such splits. |
| Hardware Specification | Yes | Every experiment is run on a single NVIDIA-A100 GPU. ... NVIDIA A100 PCIe (40GB) |
| Software Dependencies | No | The paper mentions various models and frameworks (e.g., CLIP, BLIP, Uni Diffuser, GPT-4, PyTorch, Stable Diffusion, DALL-E, Midjourney, T5, Vicuna-13B) but does not provide specific version numbers for the software dependencies used in their implementation (e.g., Python, PyTorch, CUDA versions). |
| Experiment Setup | Yes | We set ϵ = 8 and use ℓ constraint by default as xcle xadv p ϵ = 8, which is the most commonly used setting in the adversarial literature [12], to ensure that the adversarial perturbations are visually imperceptible where the pixel values are in the range [0, 255]. We use 100-step PGD to optimize transfer-based attacks (the objectives in Eq. (1) and Eq. (2)). In each step of query-based attacks, we set query times N = 100 in Eq. (4) and update the adversarial images by 8-steps PGD using the estimated gradient. |