OT-CLIP: Understanding and Generalizing CLIP via Optimal Transport
Authors: Liangliang Shi, Jack Fan, Junchi Yan
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show their superior performance on public datasets for downstream tasks in both image and text domain. ... The code is available at https://github.com/fan23j/ICML2024-OT-CLIP. ... 3.3. Experiments on CLIP Training ... 4.4. Experiments on CLIP Inference |
| Researcher Affiliation | Academia | 1School of Artificial Intelligence & Department of Computer Science and Engineering & Mo E Lab of AI, Shanghai Jiao Tong University, Shanghai, China 2Department of Computer Science, University of North Carolina at Chapel Hill. |
| Pseudocode | Yes | Algorithm 1 Py Torch-style pseudocode for Entropic OT |
| Open Source Code | Yes | The code is available at https://github.com/fan23j/ICML2024-OT-CLIP. |
| Open Datasets | Yes | Our models are pretrained on the popular Conceptual Captions 3M (CC3M) (Sharma et al., 2018) image-text pairs and primarily evaluated on Image Net1K (Deng et al., 2009) zero-shot classification. |
| Dataset Splits | No | The paper mentions "validation sets" for generating long-tailed distributions but does not provide specific train/validation/test splits (e.g., percentages or counts) for their experiments, nor does it cite a predefined split used for validation. |
| Hardware Specification | Yes | For each experiment, we train for 30 epochs with a batch size of 256 across 4 32GB V100 GPUs for an effective batch size of 1024. |
| Software Dependencies | No | The paper provides 'Py Torch-style pseudocode' but does not specify version numbers for PyTorch, Python, or any other software libraries used. |
| Experiment Setup | Yes | For each experiment, we train for 30 epochs with a batch size of 256 across 4 32GB V100 GPUs for an effective batch size of 1024. We use learning rate lr = 5e-4 with the Adam W optimizer and weight decay of 0.1 in all our experiments. |