Shadowcast: Stealthy Data Poisoning Attacks Against Vision-Language Models

Authors: Yuancheng Xu, Jiarui Yao, Manli Shu, Yanchao Sun, Zichu Wu, Ning Yu, Tom Goldstein, Furong Huang

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental This study takes the first step in exposing VLMs susceptibility to data poisoning attacks that can manipulate responses to innocuous, everyday prompts. We introduce Shadowcast, a stealthy data poisoning attack where poison samples are visually indistinguishable from benign images with matching texts. Shadowcast demonstrates effectiveness in two attack types. The first is a traditional Label Attack, tricking VLMs into misidentifying class labels, such as confusing Donald Trump for Joe Biden. The second is a novel Persuasion Attack, leveraging VLMs text generation capabilities to craft persuasive and seemingly rational narratives for misinformation, such as portraying junk food as healthy. We show that Shadowcast effectively achieves the attacker s intentions using as few as 50 poison samples. Crucially, the poisoned samples demonstrate transferability across different VLM architectures, posing a significant concern in black-box settings. Moreover, Shadowcast remains potent under realistic conditions involving various text prompts, training data augmentation, and image compression techniques. This work reveals how poisoned VLMs can disseminate convincing yet deceptive misinformation to everyday, benign users, emphasizing the importance of data integrity for responsible VLM deployments.
Researcher Affiliation Collaboration 1 University of Maryland, College Park 2 University of Illinois Urbana-Champaign 3 Salesforce Research 4 Apple 5 University of Waterloo 4 Netflix Eyeline Studios ycxu@umd.edu
Pseudocode No The paper describes its methods using prose and mathematical equations but does not include explicit pseudocode or algorithm blocks.
Open Source Code Yes Our code is available at: https://github.com/umd-huang-lab/VLM-Poisoning.
Open Datasets Yes For the clean training dataset, we use the cc-sbu-align dataset [Zhu et al., 2023a], which consists of 3,500 detailed image description pairs and has been used for visual instruction tuning of Mini GPT4 [Zhu et al., 2023a]. ... To collect the images used for the attack tasks, we design a web spider to gather images from the Google s image search. We collect the images under the Creative Commons Licenses, which allow individuals to use, edit and utilize them in non-profit projects.
Dataset Splits No The paper mentions using a 'test set' and a 'training set' for the attack tasks' images and refers to finetuning on 'cc-sbu-align dataset', but it does not specify explicit validation splits (e.g., percentages or counts for training, validation, and test sets) for the main training or finetuning process. Benchmarks like Viz Wiz and GQA are used for evaluation, not as a validation set during training.
Hardware Specification Yes On average, it takes 86 seconds to generate a poison image using the vision encoder of LLa VA-1.5 on an NVidia A4000 GPU.
Software Dependencies No The paper mentions software and models used like 'LLa VA-1.5', 'Lo RA', 'Mini GPT-v2', 'Instruct BLIP', and 'GPT-3.5-turbo'. However, it does not provide specific version numbers for software libraries or frameworks (e.g., PyTorch 1.x, Python 3.x) which would be necessary for full reproducibility of ancillary software.
Experiment Setup Yes For experiments in the grey-box setting, we primarily utilize LLa VA-1.5 [Liu et al., 2023b] as the pre-trained vision language model for visual instruction tuning. We follow the official finetuning configuration of LLa VA-1.51, where the vision encoder is frozen and the language model with Lo RA [Hu et al., 2021] is trained using the cosine learning rate schedule with a maximal learning rate of 0.0002. Each LLa VA-1.5 model is trained for one epoch with an effective batch size of 128. ... we use the perturbation budget of ϵ = 8 255 and run the projected gradient descent (PGD) optimizer for 2000 steps with a step size 0.2 255, which decreases to 0.1 255 at step 1000.