A Closer Look at the Robustness of Contrastive Language-Image Pre-Training (CLIP)
Authors: Weijie Tu, Weijian Deng, Tom Gedeon
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | This work comprehensively investigates the safety objectives of CLIP models, specifically focusing on three key properties: resilience to visual factor variations, calibrated uncertainty estimations, and the ability to detect anomalous inputs. To this end, we study 83 CLIP models and 127 Image Net classifiers. They are diverse in architecture, (pre)training distribution and training strategies. We consider 10 visual factors (e.g., shape and pattern), 5 types of out-of-distribution data, and 8 natural and challenging test conditions with different shift types, such as texture, style, and perturbation shifts. Our study has unveiled several previously unknown insights into CLIP models. |
| Researcher Affiliation | Academia | Weijie Tu1 Weijian Deng1 Tom Gedeon2,3 1The Australian National University 2Curtin University 3University of ÓBuda |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper states: 'All the above models, including CLIP, are publicly available on TIMM [50] and Open CLIP [51]'. This refers to the external models used in the study, not the authors' own implementation code or methodology. |
| Open Datasets | Yes | We use 51 zero-shot CLIP models (CLIP) and 32 Image Net fine-tuned CLIP models (CLIP-FT)...We first pinpoint failure patterns of models by testing on Image Net-X [13]...We use large-scale OOD detection benchmark which is build up on Image Net: in-distribution (ID) Image Net v.s. {i Naturalist [52], SUN [53], PLACES [54], TEXTURE [55] and Image Net-O [7] } (OOD)...We study ID and OOD datasets, where Image Net validation is ID dataset and OOD datasets are: Image Net-V2 [3], Image Net-Rendition [5], Image Net-Adversarial [7], Image Net Sketch [4], Object Net [6] and Image Net-Vid-Robust [56]. |
| Dataset Splits | Yes | We study ID and OOD datasets, where Image Net validation is ID dataset and OOD datasets are: Image Net-V2 [3], Image Net-Rendition [5], Image Net-Adversarial [7], Image Net Sketch [4], Object Net [6] and Image Net-Vid-Robust [56]. Metrics are estimated calibration error (ECE) [57] and negative log likelihood (NLL). A lower ECE or NLL indicates better calibration. ...we divide the validation set of Image Net into two halves: one for temperature scaling (ID calibration set), and the other one for ID test. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used for running its experiments (e.g., GPU models, CPU types, or cluster specifications). |
| Software Dependencies | No | The paper mentions that models are 'publicly available on TIMM [50] and Open CLIP [51]' but does not provide specific version numbers for any software libraries, frameworks (like PyTorch or TensorFlow), or other ancillary software components used in their experiments. |
| Experiment Setup | No | The paper describes the models and datasets used, and mentions fine-tuning procedures (e.g., 'fine-tuned on Image Net-1K', 'additional fine-tuning on Image Net-12K') and prompt templates ('default prompt template by [1]'), but it does not specify concrete hyperparameter values (e.g., learning rate, batch size, number of epochs) or detailed training configurations (e.g., optimizer settings) for the experimental setups. |