On the Generalization of Multi-modal Contrastive Learning
Authors: Qi Zhang, Yifei Wang, Yisen Wang
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this paper, we propose the first theoretical analysis on the generalization ability of MMCL... Based on this perspective, we compare MMCL and SSCL on real-world data and show that text-induced positive pairs have better semantic consistency and diversity... which validates our understanding of the superiority of multi-modal positive pairs. ...we propose four different techniques and they both bring improvements (as much as 6.2%) on Image Net. |
| Researcher Affiliation | Academia | 1National Key Lab of General Artificial Intelligence, School of Intelligence Science and Technology, Peking University 2School of Mathematical Sciences, Peking University 3Institute for Artificial Intelligence, Peking University. |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code is available at https://github. com/PKU-ML/CLIP-Help-Sim CLR. |
| Open Datasets | Yes | we pretrain the same backbone Vi T-B (Dosovitskiy et al., 2021) on the same dataset, YFCC15M (Thomee et al., 2016; Radford et al., 2021), and evaluate the learned representations on Image Net (Deng et al., 2009a). |
| Dataset Splits | Yes | For efficiency, we randomly draw 1,000 samples from 10 random classes of the Image Net validation set. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory specifications) used for running its experiments. |
| Software Dependencies | No | The paper mentions software components like SimCLR, CLIP, ResNet-50, and specific optimizers (LARS, SGD), but it does not provide specific version numbers for these software dependencies or programming languages used. |
| Experiment Setup | Yes | We train the encoder for 100 epochs on Image Net with 512 batch size and use the LARS optimizer with a cosine annealed learning rate schedule. ...we train a linear classifier following the frozen backbones and optimize the Cross Entropy loss with the SGD optimizer. |