CLOOB: Modern Hopfield Networks with InfoLOOB Outperform CLIP
Authors: Andreas Fürst, Elisabeth Rumetshofer, Johannes Lehner, Viet T. Tran, Fei Tang, Hubert Ramsauer, David Kreil, Michael Kopp, Günter Klambauer, Angela Bitto, Sepp Hochreiter
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In experiments we compare CLOOB to CLIP after pre-training on the Conceptual Captions and the YFCC dataset with respect to their zero-shot transfer learning performance on other datasets. CLOOB consistently outperforms CLIP at zero-shot transfer learning across all considered architectures and datasets. |
| Researcher Affiliation | Collaboration | 1 ELLIS Unit Linz and LIT AI Lab, Institute for Machine Learning, Johannes Kepler University, Linz, Austria 2 Institute of Advanced Research in Artificial Intelligence (IARAI), Vienna, Austria 3 HERE Technologies, Zurich, Switzerland |
| Pseudocode | Yes | Pseudocode 1 shows CLOOB in a Py Torch-like style. |
| Open Source Code | Yes | Code is available at: https://github.com/ml-jku/cloob |
| Open Datasets | Yes | The first dataset, Conceptual Captions (CC) (Sharma et al., 2018)... The second dataset, a subset of YFCC100M (Thomee et al., 2016)... |
| Dataset Splits | No | The paper mentions training for a certain number of epochs and when 'evaluation performance plateaued', implying a validation process. However, it does not specify the exact size or splitting methodology for a validation set within the pre-training datasets (Conceptual Captions, YFCC). |
| Hardware Specification | Yes | We used several different servers equipped with GPUs of different types, such as V100 and A100. The total amount of compute is roughly 11, 000 GPU hours (with A100). |
| Software Dependencies | No | The paper mentions 'Py Torch (Paszke et al., 2017)' and 'Weights & Biases (Biewald, 2020)', but these citations do not provide specific version numbers for the software used, which is required for reproducibility. |
| Experiment Setup | Yes | The hyperparameter values of Open CLIP were used as default, concretely, a learning rate of 1 10 3 and a weight decay of 0.1 for the Adam optimizer... Deviating from Open CLIP, we used a batch size of 512... we set 1 to a fixed value of 30... For modern Hopfield networks, the hyperparameter β was set to 8. ...the learning rate was set to 5 10 4 and the batch size to 1024 as used in Open CLIP... For modern Hopfield networks, the hyperparameter β was set to 14.3. |