Robust Contrastive Language-Image Pretraining against Data Poisoning and Backdoor Attacks
Authors: Wenhan Yang, Jingdong Gao, Baharan Mirzasoleiman
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our extensive experiments show that ROCLIP renders state-of-the-art targeted data poisoning and backdoor attacks ineffective during pre-training CLIP models. |
| Researcher Affiliation | Collaboration | Wenhan Yang Jingdong Gao Baharan Mirzasoleiman {hangeryang18, mxuan, baharan}@cs.ucla.edu Computer Science Department, UCLA and This research was supported by the National Science Foundation CAREER Award 2146492 and Cisco Systems. |
| Pseudocode | Yes | Algorithm 1 Robust CLIP pre-training (ROCLIP) |
| Open Source Code | Yes | 1Code is available at https://github.com/BigML-CS-UCLA/RoCLIP |
| Open Datasets | Yes | We use Conceptual Captions 3M (CC3M) (Sharma et al., 2018) as our pre-training dataset. ... We assess our method on 10 downstream datasets introduced by (Kornblith et al., 2019), the detail of which can be found in Table 1. |
| Dataset Splits | Yes | For pre-training, we randomly sampled 1M image-caption pairs from CC3M as our training dataset. ... We choose a random target image xt from the conceptual captions validation set, and then choose a random target class from the Image Net test set to generate a set of |Tadv| adversarial captions. |
| Hardware Specification | No | The paper discusses the experimental process and training details but does not specify any particular CPU, GPU models, or other hardware specifications used for running the experiments. |
| Software Dependencies | No | The paper mentions using an 'open-source implementation of CLIP' with 'Res Net50 as the image encoder and Transformer as the text encoder' and refers to specific loss functions like 'Info NCE loss' and augmentation policies like 'EDA', but it does not provide specific version numbers for any software libraries, frameworks, or dependencies used. |
| Experiment Setup | Yes | Each experiment is run with a batch size of 512 for 24 epochs... We select 2% of the total dataset size as our pool size and K = 3 in our experiments. ... In particular, we use random image cropping, horizontal flipping, color jittering (Wu et al., 2018), grayscale conversion (Wu et al., 2018), and blurring (Chen et al., 2020) in our image augmentation policies. For the text augmentation, we use the EDA proposed by (Wei & Zou, 2019), which includes synonym replacement, random swap, and random deletion as its augmentation policies. |