A Closer Look at the Robustness of Contrastive Language-Image Pre-Training (CLIP)

Authors: Weijie Tu, Weijian Deng, Tom Gedeon

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental This work comprehensively investigates the safety objectives of CLIP models, specifically focusing on three key properties: resilience to visual factor variations, calibrated uncertainty estimations, and the ability to detect anomalous inputs. To this end, we study 83 CLIP models and 127 Image Net classifiers. They are diverse in architecture, (pre)training distribution and training strategies. We consider 10 visual factors (e.g., shape and pattern), 5 types of out-of-distribution data, and 8 natural and challenging test conditions with different shift types, such as texture, style, and perturbation shifts. Our study has unveiled several previously unknown insights into CLIP models.
Researcher Affiliation Academia Weijie Tu1 Weijian Deng1 Tom Gedeon2,3 1The Australian National University 2Curtin University 3University of ÓBuda
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code No The paper states: 'All the above models, including CLIP, are publicly available on TIMM [50] and Open CLIP [51]'. This refers to the external models used in the study, not the authors' own implementation code or methodology.
Open Datasets Yes We use 51 zero-shot CLIP models (CLIP) and 32 Image Net fine-tuned CLIP models (CLIP-FT)...We first pinpoint failure patterns of models by testing on Image Net-X [13]...We use large-scale OOD detection benchmark which is build up on Image Net: in-distribution (ID) Image Net v.s. {i Naturalist [52], SUN [53], PLACES [54], TEXTURE [55] and Image Net-O [7] } (OOD)...We study ID and OOD datasets, where Image Net validation is ID dataset and OOD datasets are: Image Net-V2 [3], Image Net-Rendition [5], Image Net-Adversarial [7], Image Net Sketch [4], Object Net [6] and Image Net-Vid-Robust [56].
Dataset Splits Yes We study ID and OOD datasets, where Image Net validation is ID dataset and OOD datasets are: Image Net-V2 [3], Image Net-Rendition [5], Image Net-Adversarial [7], Image Net Sketch [4], Object Net [6] and Image Net-Vid-Robust [56]. Metrics are estimated calibration error (ECE) [57] and negative log likelihood (NLL). A lower ECE or NLL indicates better calibration. ...we divide the validation set of Image Net into two halves: one for temperature scaling (ID calibration set), and the other one for ID test.
Hardware Specification No The paper does not provide any specific details about the hardware used for running its experiments (e.g., GPU models, CPU types, or cluster specifications).
Software Dependencies No The paper mentions that models are 'publicly available on TIMM [50] and Open CLIP [51]' but does not provide specific version numbers for any software libraries, frameworks (like PyTorch or TensorFlow), or other ancillary software components used in their experiments.
Experiment Setup No The paper describes the models and datasets used, and mentions fine-tuning procedures (e.g., 'fine-tuned on Image Net-1K', 'additional fine-tuning on Image Net-12K') and prompt templates ('default prompt template by [1]'), but it does not specify concrete hyperparameter values (e.g., learning rate, batch size, number of epochs) or detailed training configurations (e.g., optimizer settings) for the experimental setups.