CLOOB: Modern Hopfield Networks with InfoLOOB Outperform CLIP

Authors: Andreas Fürst, Elisabeth Rumetshofer, Johannes Lehner, Viet T. Tran, Fei Tang, Hubert Ramsauer, David Kreil, Michael Kopp, Günter Klambauer, Angela Bitto, Sepp Hochreiter

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In experiments we compare CLOOB to CLIP after pre-training on the Conceptual Captions and the YFCC dataset with respect to their zero-shot transfer learning performance on other datasets. CLOOB consistently outperforms CLIP at zero-shot transfer learning across all considered architectures and datasets.
Researcher Affiliation Collaboration 1 ELLIS Unit Linz and LIT AI Lab, Institute for Machine Learning, Johannes Kepler University, Linz, Austria 2 Institute of Advanced Research in Artificial Intelligence (IARAI), Vienna, Austria 3 HERE Technologies, Zurich, Switzerland
Pseudocode Yes Pseudocode 1 shows CLOOB in a Py Torch-like style.
Open Source Code Yes Code is available at: https://github.com/ml-jku/cloob
Open Datasets Yes The first dataset, Conceptual Captions (CC) (Sharma et al., 2018)... The second dataset, a subset of YFCC100M (Thomee et al., 2016)...
Dataset Splits No The paper mentions training for a certain number of epochs and when 'evaluation performance plateaued', implying a validation process. However, it does not specify the exact size or splitting methodology for a validation set within the pre-training datasets (Conceptual Captions, YFCC).
Hardware Specification Yes We used several different servers equipped with GPUs of different types, such as V100 and A100. The total amount of compute is roughly 11, 000 GPU hours (with A100).
Software Dependencies No The paper mentions 'Py Torch (Paszke et al., 2017)' and 'Weights & Biases (Biewald, 2020)', but these citations do not provide specific version numbers for the software used, which is required for reproducibility.
Experiment Setup Yes The hyperparameter values of Open CLIP were used as default, concretely, a learning rate of 1 10 3 and a weight decay of 0.1 for the Adam optimizer... Deviating from Open CLIP, we used a batch size of 512... we set 1 to a fixed value of 30... For modern Hopfield networks, the hyperparameter β was set to 8. ...the learning rate was set to 5 10 4 and the batch size to 1024 as used in Open CLIP... For modern Hopfield networks, the hyperparameter β was set to 14.3.