Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Improving CLIP Training with Language Rewrites

Authors: Lijie Fan, Dilip Krishnan, Phillip Isola, Dina Katabi, Yonglong Tian

NeurIPS 2023 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on CC3M, CC12M, Red Caps and LAION-400M datasets show that CLIP pre-training with language rewrites significantly improves the transfer performance without computation or memory overhead during training.
Researcher Affiliation Collaboration 1Google Research, 2MIT CSAIL
Pseudocode No The paper describes its methods using natural language and mathematical equations but does not include any pseudocode or algorithm blocks.
Open Source Code Yes Code is available at https://github.com/Lijie Fan/La CLIP.
Open Datasets Yes Our experiments were conducted on four different image-text datasets at different scale: Conceptual Captions 3M (CC3M) [51], Conceptual Captions 12M (CC12M) [7], Red Caps [15], and LAION-400M[49].
Dataset Splits No The paper mentions using a 'validation set' for hyperparameter tuning on downstream tasks, but it does not provide explicit details about the train/validation/test splits (e.g., percentages or sample counts) for the primary pre-training datasets (CC3M, CC12M, Red Caps, LAION-400M).
Hardware Specification Yes The pre-training process was conducted on four machines with eight A100 GPUs each.
Software Dependencies No The paper mentions software components like 'Adam W optimizer', 'ViT architecture', 'Scikit-learn', 'torchvision', and 'VISSL', but it does not provide specific version numbers for these or other key software dependencies (e.g., Python, PyTorch, CUDA).
Experiment Setup Yes Table A3 provides an overview of the pre-training hyperparameters used for CLIP on all datasets.