Expediting Contrastive Language-Image Pretraining via Self-Distilled Encoders
Authors: Bumsoo Kim, Jinhyung Kim, Yeonsik Jo, Seung Hwan Kim
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through our extensive experiments, we validate that there is a sweet spot between expedition and distillation where the partial view from the expedited online image encoder interacts complementarily with the momentum teacher. As a result, ECLIPSE outperforms its counterparts while achieving substantial acceleration in inference speed. |
| Researcher Affiliation | Industry | Bumsoo Kim*, Jinhyung Kim, Yeonsik Jo, Seung Hwan Kim LG AI Research *correspondence to: bumsoo.kim@lgresearch.ai |
| Pseudocode | No | No pseudocode or algorithm blocks were found in the paper. |
| Open Source Code | No | For implementation details, our work is built on top of the open-source SLIP codebase (Mu et al. 2021)1. For De CLIP (Li et al. 2022), we follow the implementation details of the official code release2. Footnotes link to 'https://github.com/facebookresearch/SLIP' and 'https://github.com/Sense-GVT/De CLIP', which are external codebases, not the authors' own code for ECLIPSE. |
| Open Datasets | Yes | we pretrain ECLIPSE on large-scale open-source datasets, CC (Conceptual Captions) 3M (Sharma et al. 2018) and YFCC (Yahoo Flickr Creative Commons) 15M (Thomee et al. 2016). |
| Dataset Splits | No | The paper mentions pretraining on CC3M and YFCC15M datasets and evaluating on downstream datasets, but it does not explicitly state the training, validation, and test splits for the pretraining datasets. |
| Hardware Specification | Yes | All of our models are pretrained in 16 A100 GPUs. |
| Software Dependencies | No | The paper mentions building on 'open-source SLIP codebase' and following 'official code release' for De CLIP, but it does not specify version numbers for Python, PyTorch, CUDA, or other specific software libraries. |
| Experiment Setup | Yes | All models are pretrained on the CC3M dataset with a learning rate 5e-4 for 40 epochs4. We use κ=0.7 for EVi T with a Vi T-B/16 backbone." and "We use m = 0.994 in our experiments. |