Does CLIP’s generalization performance mainly stem from high train-test similarity?
Authors: Prasanna Mayilvahanan, Thaddäus Wiedemer, Evgenia Rusak, Matthias Bethge, Wieland Brendel
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To test this hypothesis, we retrain CLIP on pruned LAION splits that replicate Image Net s train-test similarity with respect to common OOD benchmarks. While we observe a performance drop on some benchmarks, surprisingly, CLIP s overall performance remains high. |
| Researcher Affiliation | Academia | 1University of Tübingen 2Tübingen AI Center 3Max-Planck-Institute for Intelligent Systems, Tübingen 4ELLIS Institute Tübingen |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | *Equal contribution. Code available at https://github.com/brendel-group/clip-ood |
| Open Datasets | Yes | For example, Open CLIP (Schuhmann et al., 2022), the open-source version of CLIP (Radford et al., 2021), is trained on LAION-400M, a web-scale dataset with a wide variety of image-text pairs (Schuhmann et al., 2021). |
| Dataset Splits | No | The paper describes various pruned datasets used for training and testing, but it does not specify explicit training, validation, and test dataset splits (e.g., percentages or exact counts) for reproducibility. |
| Hardware Specification | Yes | For all our pruning experiments, we train CLIP Vi T-B/32 (Dosovitskiy et al., 2020) for 32 epochs with a batch size of 33,600 on one node with eight A100 GPUs (training takes several days, depending on the dataset size). |
| Software Dependencies | No | The paper mentions using 'CLIP Vi T-B/16+', 'CLIP Vi T-B/32', and 'Py Torch' but does not provide specific version numbers for these or other software dependencies. |
| Experiment Setup | Yes | For all our pruning experiments, we train CLIP Vi T-B/32 (Dosovitskiy et al., 2020) for 32 epochs with a batch size of 33,600 on one node with eight A100 GPUs (training takes several days, depending on the dataset size). We use the implementation provided by Ilharco et al. (2021) and stick to their settings for learning rate, weight decay, etc. |