Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

OT-CLIP: Understanding and Generalizing CLIP via Optimal Transport

Authors: Liangliang Shi, Jack Fan, Junchi Yan

ICML 2024 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show their superior performance on public datasets for downstream tasks in both image and text domain. ... The code is available at https://github.com/fan23j/ICML2024-OT-CLIP. ... 3.3. Experiments on CLIP Training ... 4.4. Experiments on CLIP Inference
Researcher Affiliation Academia 1School of Artificial Intelligence & Department of Computer Science and Engineering & Mo E Lab of AI, Shanghai Jiao Tong University, Shanghai, China 2Department of Computer Science, University of North Carolina at Chapel Hill.
Pseudocode Yes Algorithm 1 Py Torch-style pseudocode for Entropic OT
Open Source Code Yes The code is available at https://github.com/fan23j/ICML2024-OT-CLIP.
Open Datasets Yes Our models are pretrained on the popular Conceptual Captions 3M (CC3M) (Sharma et al., 2018) image-text pairs and primarily evaluated on Image Net1K (Deng et al., 2009) zero-shot classification.
Dataset Splits No The paper mentions "validation sets" for generating long-tailed distributions but does not provide specific train/validation/test splits (e.g., percentages or counts) for their experiments, nor does it cite a predefined split used for validation.
Hardware Specification Yes For each experiment, we train for 30 epochs with a batch size of 256 across 4 32GB V100 GPUs for an effective batch size of 1024.
Software Dependencies No The paper provides 'Py Torch-style pseudocode' but does not specify version numbers for PyTorch, Python, or any other software libraries used.
Experiment Setup Yes For each experiment, we train for 30 epochs with a batch size of 256 across 4 32GB V100 GPUs for an effective batch size of 1024. We use learning rate lr = 5e-4 with the Adam W optimizer and weight decay of 0.1 in all our experiments.