Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Defending Multimodal Backdoored Models by Repulsive Visual Prompt Tuning

Authors: Zhifang Zhang, Shuo He, Haobo Wang, Bingquan Shen, Lei Feng

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical results demonstrate that RVPT tunes only 0.27% of the parameters in CLIP, yet it significantly outperforms state-of-the-art defense methods, reducing the attack success rate from 89.70% to 2.76% against the most advanced multimodal attacks on Image Net and effectively generalizes its defensive capabilities across multiple datasets.
Researcher Affiliation Academia Zhifang Zhang1,2 Shuo He3 Haobo Wang4 Bingquan Shen5 Lei Feng1 1Southeast University 2University of Queensland 3Nanyang Technological University 4Zhejiang University 5National University of Singapore
Pseudocode No The paper describes the proposed approach Repulsive Visual Prompt Tuning (RVPT) using text and mathematical equations (e.g., LFR = 1 L D d=D+1 cos(cd i , σd i ) and L = LCE + αLFR), but it does not include a distinct section or figure explicitly labeled as 'Pseudocode' or 'Algorithm', nor does it present the steps in a structured, code-like format.
Open Source Code Yes The code is publicly available in our Git Hub repository: https://github.com/zhangzf01/RVPT.
Open Datasets Yes We utilize Image Net [10] to evaluate the performance of defense methods against all the aforementioned attacks, with the target class of the attacks set to banana. Additionally, we use Caltech101 [14] and Oxford Pet [50] (a fine-grained dataset) to evaluate defense methods against Bad Net, Blended, and Wa Net. This paper assesses ASR and CA using three downstream datasets: Image Net1K [10], Caltech101 [14], and Oxford Pets [50]. For cross-domain evaluation, Image Net-V2 [54], Image Net-A [20], Image Net-R [19], Image Net-Sketch [61] are also evaluated. Additionally, Clean CLIP selects clean image-text pairs from CC3M [55] for fine-tuning the backdoored CLIP model.
Dataset Splits Yes For all datasets, we randomly sample 16 images per class while setting the batch size and the epoch number to 32 and 50.
Hardware Specification Yes We conduct experiments on eight NVIDIA RTX 3090 GPUs and the computational expenses are shown in Appendix D.
Software Dependencies No The paper mentions using PyTorch [51] for implementation and references Open AI's CLIP, but it does not provide specific version numbers for these or other software dependencies (e.g., 'PyTorch 1.x', 'CUDA 11.x', 'Python 3.x') necessary for full reproducibility of the software environment.
Experiment Setup Yes The prompt is initialized from a zero mean Gaussian distribution. We set the non-prompted depth D = 3, learnable context length b = 50, and balancing factor α = 2. Moreover, the proxy caption of the class the same as the simple prompt engineering in the original paper of CLIP [52]: (1) a photo of <CLS>. for Image Net and Caltech101 (2) a photo of <CLS>, a type of pet. for Oxford Pets. For all datasets, we randomly sample 16 images per class while setting the batch size and the epoch number to 32 and 50. The loss is optimized with SGD with an initial learning rate of 0.002 decayed by the cosine annealing rule.