Improving CLIP Training with Language Rewrites

Authors: Lijie Fan, Dilip Krishnan, Phillip Isola, Dina Katabi, Yonglong Tian

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on CC3M, CC12M, Red Caps and LAION-400M datasets show that CLIP pre-training with language rewrites significantly improves the transfer performance without computation or memory overhead during training.
Researcher Affiliation Collaboration 1Google Research, 2MIT CSAIL
Pseudocode No The paper describes its methods using natural language and mathematical equations but does not include any pseudocode or algorithm blocks.
Open Source Code Yes Code is available at https://github.com/Lijie Fan/La CLIP.
Open Datasets Yes Our experiments were conducted on four different image-text datasets at different scale: Conceptual Captions 3M (CC3M) [51], Conceptual Captions 12M (CC12M) [7], Red Caps [15], and LAION-400M[49].
Dataset Splits No The paper mentions using a 'validation set' for hyperparameter tuning on downstream tasks, but it does not provide explicit details about the train/validation/test splits (e.g., percentages or sample counts) for the primary pre-training datasets (CC3M, CC12M, Red Caps, LAION-400M).
Hardware Specification Yes The pre-training process was conducted on four machines with eight A100 GPUs each.
Software Dependencies No The paper mentions software components like 'Adam W optimizer', 'ViT architecture', 'Scikit-learn', 'torchvision', and 'VISSL', but it does not provide specific version numbers for these or other key software dependencies (e.g., Python, PyTorch, CUDA).
Experiment Setup Yes Table A3 provides an overview of the pre-training hyperparameters used for CLIP on all datasets.