OT-CLIP: Understanding and Generalizing CLIP via Optimal Transport

Authors: Liangliang Shi, Jack Fan, Junchi Yan

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show their superior performance on public datasets for downstream tasks in both image and text domain. ... The code is available at https://github.com/fan23j/ICML2024-OT-CLIP. ... 3.3. Experiments on CLIP Training ... 4.4. Experiments on CLIP Inference
Researcher Affiliation Academia 1School of Artificial Intelligence & Department of Computer Science and Engineering & Mo E Lab of AI, Shanghai Jiao Tong University, Shanghai, China 2Department of Computer Science, University of North Carolina at Chapel Hill.
Pseudocode Yes Algorithm 1 Py Torch-style pseudocode for Entropic OT
Open Source Code Yes The code is available at https://github.com/fan23j/ICML2024-OT-CLIP.
Open Datasets Yes Our models are pretrained on the popular Conceptual Captions 3M (CC3M) (Sharma et al., 2018) image-text pairs and primarily evaluated on Image Net1K (Deng et al., 2009) zero-shot classification.
Dataset Splits No The paper mentions "validation sets" for generating long-tailed distributions but does not provide specific train/validation/test splits (e.g., percentages or counts) for their experiments, nor does it cite a predefined split used for validation.
Hardware Specification Yes For each experiment, we train for 30 epochs with a batch size of 256 across 4 32GB V100 GPUs for an effective batch size of 1024.
Software Dependencies No The paper provides 'Py Torch-style pseudocode' but does not specify version numbers for PyTorch, Python, or any other software libraries used.
Experiment Setup Yes For each experiment, we train for 30 epochs with a batch size of 256 across 4 32GB V100 GPUs for an effective batch size of 1024. We use learning rate lr = 5e-4 with the Adam W optimizer and weight decay of 0.1 in all our experiments.