Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Expediting Contrastive Language-Image Pretraining via Self-Distilled Encoders

Authors: Bumsoo Kim, Jinhyung Kim, Yeonsik Jo, Seung Hwan Kim

AAAI 2024 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through our extensive experiments, we validate that there is a sweet spot between expedition and distillation where the partial view from the expedited online image encoder interacts complementarily with the momentum teacher. As a result, ECLIPSE outperforms its counterparts while achieving substantial acceleration in inference speed.
Researcher Affiliation Industry Bumsoo Kim*, Jinhyung Kim, Yeonsik Jo, Seung Hwan Kim LG AI Research *correspondence to: EMAIL
Pseudocode No No pseudocode or algorithm blocks were found in the paper.
Open Source Code No For implementation details, our work is built on top of the open-source SLIP codebase (Mu et al. 2021)1. For De CLIP (Li et al. 2022), we follow the implementation details of the official code release2. Footnotes link to 'https://github.com/facebookresearch/SLIP' and 'https://github.com/Sense-GVT/De CLIP', which are external codebases, not the authors' own code for ECLIPSE.
Open Datasets Yes we pretrain ECLIPSE on large-scale open-source datasets, CC (Conceptual Captions) 3M (Sharma et al. 2018) and YFCC (Yahoo Flickr Creative Commons) 15M (Thomee et al. 2016).
Dataset Splits No The paper mentions pretraining on CC3M and YFCC15M datasets and evaluating on downstream datasets, but it does not explicitly state the training, validation, and test splits for the pretraining datasets.
Hardware Specification Yes All of our models are pretrained in 16 A100 GPUs.
Software Dependencies No The paper mentions building on 'open-source SLIP codebase' and following 'official code release' for De CLIP, but it does not specify version numbers for Python, PyTorch, CUDA, or other specific software libraries.
Experiment Setup Yes All models are pretrained on the CC3M dataset with a learning rate 5e-4 for 40 epochs4. We use Îș=0.7 for EVi T with a Vi T-B/16 backbone." and "We use m = 0.994 in our experiments.