reproducibilityindex.ai

Robust Contrastive Language-Image Pretraining against Data Poisoning and Backdoor Attacks

Authors: Wenhan Yang, Jingdong Gao, Baharan Mirzasoleiman

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our extensive experiments show that ROCLIP renders state-of-the-art targeted data poisoning and backdoor attacks ineffective during pre-training CLIP models.
Researcher Affiliation	Collaboration	Wenhan Yang Jingdong Gao Baharan Mirzasoleiman {hangeryang18, mxuan, baharan}@cs.ucla.edu Computer Science Department, UCLA and This research was supported by the National Science Foundation CAREER Award 2146492 and Cisco Systems.
Pseudocode	Yes	Algorithm 1 Robust CLIP pre-training (ROCLIP)
Open Source Code	Yes	1Code is available at https://github.com/BigML-CS-UCLA/RoCLIP
Open Datasets	Yes	We use Conceptual Captions 3M (CC3M) (Sharma et al., 2018) as our pre-training dataset. ... We assess our method on 10 downstream datasets introduced by (Kornblith et al., 2019), the detail of which can be found in Table 1.
Dataset Splits	Yes	For pre-training, we randomly sampled 1M image-caption pairs from CC3M as our training dataset. ... We choose a random target image xt from the conceptual captions validation set, and then choose a random target class from the Image Net test set to generate a set of \|Tadv\| adversarial captions.
Hardware Specification	No	The paper discusses the experimental process and training details but does not specify any particular CPU, GPU models, or other hardware specifications used for running the experiments.
Software Dependencies	No	The paper mentions using an 'open-source implementation of CLIP' with 'Res Net50 as the image encoder and Transformer as the text encoder' and refers to specific loss functions like 'Info NCE loss' and augmentation policies like 'EDA', but it does not provide specific version numbers for any software libraries, frameworks, or dependencies used.
Experiment Setup	Yes	Each experiment is run with a batch size of 512 for 24 epochs... We select 2% of the total dataset size as our pool size and K = 3 in our experiments. ... In particular, we use random image cropping, horizontal flipping, color jittering (Wu et al., 2018), grayscale conversion (Wu et al., 2018), and blurring (Chen et al., 2020) in our image augmentation policies. For the text augmentation, we use the EDA proposed by (Wei & Zou, 2019), which includes synonym replacement, random swap, and random deletion as its augmentation policies.