Does CLIP’s generalization performance mainly stem from high train-test similarity?

Authors: Prasanna Mayilvahanan, Thaddäus Wiedemer, Evgenia Rusak, Matthias Bethge, Wieland Brendel

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To test this hypothesis, we retrain CLIP on pruned LAION splits that replicate Image Net s train-test similarity with respect to common OOD benchmarks. While we observe a performance drop on some benchmarks, surprisingly, CLIP s overall performance remains high.
Researcher Affiliation Academia 1University of Tübingen 2Tübingen AI Center 3Max-Planck-Institute for Intelligent Systems, Tübingen 4ELLIS Institute Tübingen
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code Yes *Equal contribution. Code available at https://github.com/brendel-group/clip-ood
Open Datasets Yes For example, Open CLIP (Schuhmann et al., 2022), the open-source version of CLIP (Radford et al., 2021), is trained on LAION-400M, a web-scale dataset with a wide variety of image-text pairs (Schuhmann et al., 2021).
Dataset Splits No The paper describes various pruned datasets used for training and testing, but it does not specify explicit training, validation, and test dataset splits (e.g., percentages or exact counts) for reproducibility.
Hardware Specification Yes For all our pruning experiments, we train CLIP Vi T-B/32 (Dosovitskiy et al., 2020) for 32 epochs with a batch size of 33,600 on one node with eight A100 GPUs (training takes several days, depending on the dataset size).
Software Dependencies No The paper mentions using 'CLIP Vi T-B/16+', 'CLIP Vi T-B/32', and 'Py Torch' but does not provide specific version numbers for these or other software dependencies.
Experiment Setup Yes For all our pruning experiments, we train CLIP Vi T-B/32 (Dosovitskiy et al., 2020) for 32 epochs with a batch size of 33,600 on one node with eight A100 GPUs (training takes several days, depending on the dataset size). We use the implementation provided by Ilharco et al. (2021) and stick to their settings for learning rate, weight decay, etc.