Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Attention! Your Vision Language Model Could Be Maliciously Manipulated

Authors: Xiaosen Wang, Shaokang Wang, Zhijin Ge, Yuyang Luo, Shudong Zhang

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this work, we empirically and theoretically demonstrate that VLMs are particularly susceptible to image-based adversarial examples, where imperceptible perturbations can precisely manipulate each output token. To this end, we propose a novel attack called Visionlanguage model Manipulation Attack (VMA)... Extensive empirical evaluations substantiate the efficacy and generalizability of VMA across diverse scenarios and datasets.
Researcher Affiliation Academia Xiaosen Wang1, Shaokang Wang2, Zhijin Ge3, Yuyang Luo4, Shudong Zhang3 1Huazhong University of Science and Technology, 2Shanghai Jiaotong University, 3Xidian University, 4Brown University EMAIL
Pseudocode Yes Algorithm 1 Vision-language model Manipulation Attack (VMA)
Open Source Code Yes Code is available at https://github.com/Trustworthy-AI-Group/VMA.
Open Datasets Yes We employ four open-source VLMs, namely Llava [20], Phi3 [1], Qwen2-VL [34], and Deep Seek VL [25] to evaluate the effectiveness of VMA... We design three cross-matching tasks... using a randomly sampled subset of 1, 000 images from the COCO dataset [18]... we sample 518 images from the POPE benchmark [17]... we sampled 438 images from the MLLMU-Bench dataset [24]... we filter 161 non-rejectable image-text pairs from the MM-Vet dataset [48].
Dataset Splits Yes To validate the effectiveness of VMA to manipulate the output of VLMs, We construct an evaluation candidate pool by randomly pairing 1, 000 distinct prompts, images, and target outputs. Then, we randomly sample 1, 000 text-image input-output pairs for evaluation. Detailed settings and results are summarized in Appendix E.
Hardware Specification No The paper does not explicitly describe the specific hardware used for running its experiments, such as GPU models, CPU models, or memory specifications.
Software Dependencies No The paper mentions using GPT-4o-2024-08-06 for evaluation and algorithms like Adam and PGD, but does not provide specific version numbers for general software dependencies or libraries used for implementation.
Experiment Setup Yes For the perturbation constraints, we adopt ℓ -norm to ensure its imperceptibility with various perturbation budgets, namely ϵ = 4/255, 8/255, 16/255... where x0 = x initializes the adversarial image, α is the step size... The complete algorithm is summarized in Algorithm 1 [which includes perturbation budget ϵ, step size α and exponential decay rate β1 and β2, number of iteration N]... Learning rate As shown in Tab. 9, when the learning rate is set to a small value (e.g., 0.001)... stabilizes around a learning rate of 0.1. Momentum coefficient As shown in Tab. 10... reaching its peak at 0.9. Based on that, we adopt 0.9 as the default momentum coefficient in our experiments.