Improving CLIP Training with Language Rewrites
Authors: Lijie Fan, Dilip Krishnan, Phillip Isola, Dina Katabi, Yonglong Tian
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on CC3M, CC12M, Red Caps and LAION-400M datasets show that CLIP pre-training with language rewrites significantly improves the transfer performance without computation or memory overhead during training. |
| Researcher Affiliation | Collaboration | 1Google Research, 2MIT CSAIL |
| Pseudocode | No | The paper describes its methods using natural language and mathematical equations but does not include any pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code is available at https://github.com/Lijie Fan/La CLIP. |
| Open Datasets | Yes | Our experiments were conducted on four different image-text datasets at different scale: Conceptual Captions 3M (CC3M) [51], Conceptual Captions 12M (CC12M) [7], Red Caps [15], and LAION-400M[49]. |
| Dataset Splits | No | The paper mentions using a 'validation set' for hyperparameter tuning on downstream tasks, but it does not provide explicit details about the train/validation/test splits (e.g., percentages or sample counts) for the primary pre-training datasets (CC3M, CC12M, Red Caps, LAION-400M). |
| Hardware Specification | Yes | The pre-training process was conducted on four machines with eight A100 GPUs each. |
| Software Dependencies | No | The paper mentions software components like 'Adam W optimizer', 'ViT architecture', 'Scikit-learn', 'torchvision', and 'VISSL', but it does not provide specific version numbers for these or other key software dependencies (e.g., Python, PyTorch, CUDA). |
| Experiment Setup | Yes | Table A3 provides an overview of the pre-training hyperparameters used for CLIP on all datasets. |