reproducibilityindex.ai

Improving CLIP Training with Language Rewrites

Authors: Lijie Fan, Dilip Krishnan, Phillip Isola, Dina Katabi, Yonglong Tian

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on CC3M, CC12M, Red Caps and LAION-400M datasets show that CLIP pre-training with language rewrites significantly improves the transfer performance without computation or memory overhead during training.
Researcher Affiliation	Collaboration	1Google Research, 2MIT CSAIL
Pseudocode	No	The paper describes its methods using natural language and mathematical equations but does not include any pseudocode or algorithm blocks.
Open Source Code	Yes	Code is available at https://github.com/Lijie Fan/La CLIP.
Open Datasets	Yes	Our experiments were conducted on four different image-text datasets at different scale: Conceptual Captions 3M (CC3M) [51], Conceptual Captions 12M (CC12M) [7], Red Caps [15], and LAION-400M[49].
Dataset Splits	No	The paper mentions using a 'validation set' for hyperparameter tuning on downstream tasks, but it does not provide explicit details about the train/validation/test splits (e.g., percentages or sample counts) for the primary pre-training datasets (CC3M, CC12M, Red Caps, LAION-400M).
Hardware Specification	Yes	The pre-training process was conducted on four machines with eight A100 GPUs each.
Software Dependencies	No	The paper mentions software components like 'Adam W optimizer', 'ViT architecture', 'Scikit-learn', 'torchvision', and 'VISSL', but it does not provide specific version numbers for these or other key software dependencies (e.g., Python, PyTorch, CUDA).
Experiment Setup	Yes	Table A3 provides an overview of the pre-training hyperparameters used for CLIP on all datasets.