Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Improving CLIP Training with Language Rewrites
Authors: Lijie Fan, Dilip Krishnan, Phillip Isola, Dina Katabi, Yonglong Tian
NeurIPS 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on CC3M, CC12M, Red Caps and LAION-400M datasets show that CLIP pre-training with language rewrites significantly improves the transfer performance without computation or memory overhead during training. |
| Researcher Affiliation | Collaboration | 1Google Research, 2MIT CSAIL |
| Pseudocode | No | The paper describes its methods using natural language and mathematical equations but does not include any pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code is available at https://github.com/Lijie Fan/La CLIP. |
| Open Datasets | Yes | Our experiments were conducted on four different image-text datasets at different scale: Conceptual Captions 3M (CC3M) [51], Conceptual Captions 12M (CC12M) [7], Red Caps [15], and LAION-400M[49]. |
| Dataset Splits | No | The paper mentions using a 'validation set' for hyperparameter tuning on downstream tasks, but it does not provide explicit details about the train/validation/test splits (e.g., percentages or sample counts) for the primary pre-training datasets (CC3M, CC12M, Red Caps, LAION-400M). |
| Hardware Specification | Yes | The pre-training process was conducted on four machines with eight A100 GPUs each. |
| Software Dependencies | No | The paper mentions software components like 'Adam W optimizer', 'ViT architecture', 'Scikit-learn', 'torchvision', and 'VISSL', but it does not provide specific version numbers for these or other key software dependencies (e.g., Python, PyTorch, CUDA). |
| Experiment Setup | Yes | Table A3 provides an overview of the pre-training hyperparameters used for CLIP on all datasets. |