Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

ToxicTextCLIP: Text-Based Poisoning and Backdoor Attacks on CLIP Pre-training

Authors: Xin Yao, Haiyang Zhao, Yimin Chen, Jiawei Guo, Kecheng Huang, Ming Zhao

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on classification and retrieval tasks show that Toxic Text CLIP achieves up to 95.83% poisoning success and 98.68% backdoor Hit@1, while bypassing Ro CLIP, Clean CLIP and Safe CLIP defenses. The framework addresses two key challenges: semantic misalignment caused by background inconsistency with the target class, and the scarcity of background-consistent texts. To this end, Toxic Text CLIP iteratively applies: 1) a background-aware selector that prioritizes texts with background content aligned to the target class, and 2) a background-driven augmenter that generates semantically coherent and diverse poisoned samples.
Researcher Affiliation Academia Xin Yao1, Haiyang Zhao1, Yimin Chen2, Jiawei Guo1, Kecheng Huang1, and Ming Zhao1 1School of Computer Science and Engineering, Central South University, China 2Miner School of Computer & Information Sciences, University of Massachusetts Lowell, USA EMAIL EMAIL
Pseudocode Yes 9.1 Algorithmic Details of Toxic Text CLIP In this section, we present the complete workflow of Toxic Text CLIP, as illustrated in Algorithm 1. Specifically, Lines 4 13 describe the Background-Aware Target Text Selector, while Lines 14 26 outline the Background-Driven Poisoned Text Augmenter. Algorithm 1: Details of Toxic Text CLIP
Open Source Code Yes The source code can be accessed via https://github.com/xinyaocse/Toxic Text CLIP/.
Open Datasets Yes We evaluate our approach on three popular datasets: CC3M [Sharma et al., 2018], CC12M [Changpinyo et al., 2021], and a 15M-sample subset of YFCC [Thomee et al., 2016], referred to as YFCC15M [Gu et al., 2024]. Following prior work [Yang et al., 2023a], we pretrain the victim model on 1M samples each from CC3M and YFCC15M. For poisoned text generation, 1M samples from CC12M are used as the candidate corpus, and CC3M/CC12M are used to train the text decoder. COCO [Lin et al., 2014] serves as the test set for attack evaluation.
Dataset Splits Yes Following prior work [Yang et al., 2023a], we pretrain the victim model on 1M samples each from CC3M and YFCC15M. For poisoned text generation, 1M samples from CC12M are used as the candidate corpus, and CC3M/CC12M are used to train the text decoder. COCO [Lin et al., 2014] serves as the test set for attack evaluation. Unless stated otherwise, all attacks follow a standard setup. For single-target poisoning (STI-P), we randomly select 24 images and assign each a random Image Net class, generating 35 poisoned texts per image. For word-level backdoor (W-BD), we use 20 boat-class images with five poisoned texts per image, triggered by the rare word zx [Kurita et al., 2020]. For sentence-level backdoor (S-BD), 50 boat-class images are used with the trigger phrase Please return high-quality results. Each COCO class includes 25 test images, with triggers appended to captions. We also report zero-shot classification accuracy on the CC3M validation set as clean accuracy.
Hardware Specification Yes We implement Toxic Text CLIP using Pytorch and conduct all experiments on a server equipped with an AMD Ryzen Threadripper PRO 3995WX (64 cores) and 4 Ge Force RTX 4090 GPUs, each with 24GB of memory.
Software Dependencies No The paper mentions using Python and Pytorch in the NeurIPS checklist, and Adam W optimizer for training, but it does not specify concrete version numbers for Python, Pytorch, CUDA, or other key software libraries and tools required to reproduce the experiment.
Experiment Setup Yes Training uses the Adam W optimizer with a cosine scheduler (initial learning rate: 5 10 5, min learning rate: 10 8), batch size 512, for 10 epochs. Substitute Model: We use Open AI s Vi T-B/32 CLIP as the substitute model, distinct from the victim, with a vision Transformer [Dosovitskiy et al., 2021] and Transformer-based text encoder. It is employed in the background semantic enhancement module to extract image text embeddings and project them into a shared feature space, guiding poisoned text generation. Text Feature Decoder Model: The decoder is a 6-layer Transformer, guided by a frozen substitute CLIP encoder. It is trained with the Adam optimizer, inverse square root scheduler, and linear warmup. The initial learning rate is 10 3 (min: 10 6), with mixed precision training, batch size 832, for 32 epochs.