reproducibilityindex.ai

Expediting Contrastive Language-Image Pretraining via Self-Distilled Encoders

Authors: Bumsoo Kim, Jinhyung Kim, Yeonsik Jo, Seung Hwan Kim

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through our extensive experiments, we validate that there is a sweet spot between expedition and distillation where the partial view from the expedited online image encoder interacts complementarily with the momentum teacher. As a result, ECLIPSE outperforms its counterparts while achieving substantial acceleration in inference speed.
Researcher Affiliation	Industry	Bumsoo Kim, Jinhyung Kim, Yeonsik Jo, Seung Hwan Kim LG AI Research correspondence to: bumsoo.kim@lgresearch.ai
Pseudocode	No	No pseudocode or algorithm blocks were found in the paper.
Open Source Code	No	For implementation details, our work is built on top of the open-source SLIP codebase (Mu et al. 2021)1. For De CLIP (Li et al. 2022), we follow the implementation details of the official code release2. Footnotes link to 'https://github.com/facebookresearch/SLIP' and 'https://github.com/Sense-GVT/De CLIP', which are external codebases, not the authors' own code for ECLIPSE.
Open Datasets	Yes	we pretrain ECLIPSE on large-scale open-source datasets, CC (Conceptual Captions) 3M (Sharma et al. 2018) and YFCC (Yahoo Flickr Creative Commons) 15M (Thomee et al. 2016).
Dataset Splits	No	The paper mentions pretraining on CC3M and YFCC15M datasets and evaluating on downstream datasets, but it does not explicitly state the training, validation, and test splits for the pretraining datasets.
Hardware Specification	Yes	All of our models are pretrained in 16 A100 GPUs.
Software Dependencies	No	The paper mentions building on 'open-source SLIP codebase' and following 'official code release' for De CLIP, but it does not specify version numbers for Python, PyTorch, CUDA, or other specific software libraries.
Experiment Setup	Yes	All models are pretrained on the CC3M dataset with a learning rate 5e-4 for 40 epochs4. We use κ=0.7 for EVi T with a Vi T-B/16 backbone." and "We use m = 0.994 in our experiments.