Understanding the Robustness of Multi-modal Contrastive Learning to Distribution Shift

Authors: Yihao Xue, Siddharth Joshi, Dang Nguyen, Baharan Mirzasoleiman

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We further support our theoretical findings with experiments, including a well-designed synthetic experiment and experiments on real datasets, including MSCOCO, Conceptual Captions, and shifted versions of Image Net.
Researcher Affiliation Academia Yihao Xue, Siddharth Joshi, Dang Nguyen, Baharan Mirzasoleiman Department of Computer Science, University of California, Los Angeles yihaoxue@g.ucla.edu, sjoshi804@cs.ucla.edu, nguyentuanhaidang@gmail.com, baharan@cs.ucla.edu
Pseudocode No The paper describes mathematical formulations and processes but does not include explicit pseudocode or algorithm blocks.
Open Source Code No The paper does not provide an explicit statement or link for the release of open-source code for the described methodology.
Open Datasets Yes We validate our theoretical findings through experiments, including a well-designed synthetic experiment and an experiment involving training CLIP models on MSCOCO (Lin et al., 2014)/Conceptual Captions (Sharma et al., 2018) and evaluating them on shifted Image Nets.
Dataset Splits Yes The dataset is divided into Training, Validation, and Test splits. The Training split includes 3,318,333 pairs in which a subset of 2,007,528 has machine-generated labels Ng et al. (2020). ... To create the training and validation datasets, we split the subset with the 7:3 ratio in a stratified fashion.
Hardware Specification Yes Each experiment is run on 1 NVIDIA A6000.
Software Dependencies No The paper mentions using 'Pytorch' but does not specify its version number or any other software dependencies with specific version numbers.
Experiment Setup Yes We use momentum SGD as the optimizer with a learning rate of 0.01, weight decay of 0.001, momentum of 0.9, a batch size of 128. The model is trained for 100 epochs.