TiC-CLIP: Continual Training of CLIP Models
Authors: Saurabh Garg, Mehrdad Farajtabar, Hadi Pouransari, Raviteja Vemulapalli, Sachin Mehta, Oncel Tuzel, Vaishaal Shankar, Fartash Faghri
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We first use our benchmarks to curate various dynamic evaluations to measure temporal robustness of existing models. We show Open AI s CLIP (trained on data up to 2020) loses 8% zero-shot accuracy on our curated retrieval task from 2021 2022 compared with more recently trained models in Open CLIP repository. We then study how to efficiently train models on time-continuous data. We demonstrate that a simple rehearsal-based approach that continues training from the last checkpoint and replays old data reduces compute by 2.5ˆ when compared to the standard practice of retraining from scratch1. |
| Researcher Affiliation | Collaboration | Saurabh Garg; Mehrdad Farajtabar: Hadi Pouransari: Raviteja Vemulapalli: Sachin Mehta: Oncel Tuzel: Vaishaal Shankar: Fartash Faghri: :Apple ;Carnegie Mellon University |
| Pseudocode | No | The paper does not contain explicit pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code is available at https://github.com/apple/ml-tic-clip. |
| Open Datasets | Yes | We introduce TIC-Data Comp, a new benchmark for Time-Continual training of CLIP models, which we create by appending crawl time information to existing Common Pool dataset (Gadre et al., 2023). We also repurpose other web-scale datasets gathered from diverse sources, such as Reddit and Flickr. Specifically, we curate TIC-YFCC and TICRed Caps by leveraging time information available in YFCC (Thomee et al., 2016) and Redcaps (Desai et al., 2021) respectively. |
| Dataset Splits | No | The paper discusses training and evaluation datasets, and uses terms like 'evaluation datasets' and 'test data', but does not explicitly define or provide details for a separate 'validation' split with specified percentages or counts for hyperparameter tuning. |
| Hardware Specification | No | The paper mentions compute budgets in terms of MACs and refers to configurations from another paper (DATACOMP) for overall compute but does not specify the exact hardware (e.g., specific GPU models, CPUs) used for its own experiments. The mention of 'A100 GPU hours' refers to training of Vi T-g-14 from previous works, not the authors' own experimental setup. |
| Software Dependencies | No | The paper mentions using 'Adam optimizer' and 'Vision Transformers (Vi Ts)' and refers to the 'Open CLIP library' and 'original CLIP training recipe (Radford et al., 2021)' for hyperparameters, but it does not provide specific version numbers for any software dependencies (e.g., Python, PyTorch, CUDA, or specific library versions). |
| Experiment Setup | Yes | The paper describes a 'streaming protocol' for data, specifies a 'Memory budget' and 'Compute budget', discusses 'Learning rate schedule' (cosine decay, warm-up), and states that 'All training and hyperparameters can be found in App. D.2'. Table 1 also summarizes various method parameters like 'Train Size', 'Init.', and 'Compute'. |