reproducibilityindex.ai

Multimodal Contrastive Learning with LIMoE: the Language-Image Mixture of Experts

Authors: Basil Mustafa, Carlos Riquelme, Joan Puigcerver, Rodolphe Jenatton, Neil Houlsby

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate remarkable performance improvement over dense models of equivalent computational cost. LIMo E-L/16 trained comparably to CLIP-L/14 achieves 78.6% zero-shot Image Net accuracy (vs. 76.2%), and when further scaled to H/14 (with additional data) it achieves 84.1%, comparable to state-of-the-art methods which use larger custom per-modality backbones and pre-training schemes. We analyse the quantitative and qualitative behavior of LIMo E, and demonstrate phenomena such as differing treatment of the modalities and the organic emergence of modality-speciﬁc experts.
Researcher Affiliation	Industry	Basil Mustafa , Carlos Riquelme, Joan Puigcerver, Rodolphe Jenatton, Neil Houlsby Google Brain {basilm, rikel, jpuigcerver, rjenatton, neilhoulsby}@google.com
Pseudocode	No	The paper describes algorithms and formulations (e.g., equation 1 for contrastive training objective, equation 2 for entropy losses), but it does not include a clearly labeled pseudocode or algorithm block.
Open Source Code	No	The paper does not contain any explicit statements about releasing source code or provide links to a code repository.
Open Datasets	Yes	Training data. By default, all models are trained on paired image-text data used in [16], consisting of 3.6B images and alt-texts scraped from the web. For large LIMo E-H/14 experiment, we also co-train with JFT-4B [17].
Dataset Splits	No	The paper mentions 'Validation accuracy' in Table 3, but it does not provide specific details on how training, validation, and test splits were performed for the datasets used to reproduce the experiments (e.g., percentages, counts, or citations to predefined splits for the data itself).
Hardware Specification	No	The paper does not provide specific details about the hardware used, such as GPU models, CPU types, or memory specifications.
Software Dependencies	No	The paper does not specify any software dependencies with version numbers.
Experiment Setup	Yes	We train a range of LIMo E models at batch size 16k for 781k steps. ... In particular, we train a 32-layer LIMo E-H/14 ... It was trained at a batch size of 21k ... We train B/16 models at batch size 8096 for 100,000 steps.