reproducibilityindex.ai

Does CLIP’s generalization performance mainly stem from high train-test similarity?

Authors: Prasanna Mayilvahanan, Thaddäus Wiedemer, Evgenia Rusak, Matthias Bethge, Wieland Brendel

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To test this hypothesis, we retrain CLIP on pruned LAION splits that replicate Image Net s train-test similarity with respect to common OOD benchmarks. While we observe a performance drop on some benchmarks, surprisingly, CLIP s overall performance remains high.
Researcher Affiliation	Academia	1University of Tübingen 2Tübingen AI Center 3Max-Planck-Institute for Intelligent Systems, Tübingen 4ELLIS Institute Tübingen
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	Yes	*Equal contribution. Code available at https://github.com/brendel-group/clip-ood
Open Datasets	Yes	For example, Open CLIP (Schuhmann et al., 2022), the open-source version of CLIP (Radford et al., 2021), is trained on LAION-400M, a web-scale dataset with a wide variety of image-text pairs (Schuhmann et al., 2021).
Dataset Splits	No	The paper describes various pruned datasets used for training and testing, but it does not specify explicit training, validation, and test dataset splits (e.g., percentages or exact counts) for reproducibility.
Hardware Specification	Yes	For all our pruning experiments, we train CLIP Vi T-B/32 (Dosovitskiy et al., 2020) for 32 epochs with a batch size of 33,600 on one node with eight A100 GPUs (training takes several days, depending on the dataset size).
Software Dependencies	No	The paper mentions using 'CLIP Vi T-B/16+', 'CLIP Vi T-B/32', and 'Py Torch' but does not provide specific version numbers for these or other software dependencies.
Experiment Setup	Yes	For all our pruning experiments, we train CLIP Vi T-B/32 (Dosovitskiy et al., 2020) for 32 epochs with a batch size of 33,600 on one node with eight A100 GPUs (training takes several days, depending on the dataset size). We use the implementation provided by Ilharco et al. (2021) and stick to their settings for learning rate, weight decay, etc.