Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

CyCLIP: Cyclic Contrastive Language-Image Pretraining

Authors: Shashank Goel, Hritik Bansal, Sumit Bhatia, Ryan Rossi, Vishwa Vinay, Aditya Grover

NeurIPS 2022 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically, we show that the improved consistency in CYCLIP translates to significant gains over CLIP, with gains ranging from 10% 24% for zero-shot classification accuracy on standard benchmarks (CIFAR-10, CIFAR-100, Image Net1K) and 10% 27% for robustness to various natural distribution shifts.
Researcher Affiliation Collaboration Shashank Goel UCLA EMAIL Hritik Bansal UCLA EMAIL Sumit Bhatia MDSR Lab, Adobe Systems EMAIL Ryan A. Rossi Adobe Research EMAIL Vishwa Vinay Adobe Research EMAIL Aditya Grover UCLA EMAIL
Pseudocode No The paper does not contain a clearly labeled 'Pseudocode' or 'Algorithm' block or figure.
Open Source Code Yes The code is available at https://github.com/goel-shashank/Cy CLIP.
Open Datasets Yes We use Conceptual Captions 3M [52] (CC3M) image-caption pairs as the source of multimodal pretraining data for all our models. We compare the zero-shot performance of CLIP and CYCLIP on standard image classification datasets: CIFAR-10, CIFAR-100 [31], and Image Net1K [49].
Dataset Splits Yes The consistency score is calculated over 10K, 10K, and 50K testing images of the CIFAR-10, CIFAR-100 and Image Net dataset respectively. We use 50K samples from the training set of each dataset for k-Nearest Neighbor prediction. We assess our models on the test set of Flickr30K (1K) and MSCOCO (5K) obtained from the well-known Karpathy [30] split.
Hardware Specification Yes Further, we train our models from scratch for 64 epochs on 4 V100 GPUs with a batch size of 128 and an initial learning rate of 0.0005 with cosine scheduling and 10000 warmup steps.
Software Dependencies No The paper does not provide specific version numbers for software dependencies or libraries used for the experiments.
Experiment Setup Yes Further, we train our models from scratch for 64 epochs on 4 V100 GPUs with a batch size of 128 and an initial learning rate of 0.0005 with cosine scheduling and 10000 warmup steps. The dimension of the image and text embeddings is 1024. For CYCLIP, we use λ1 = 0.25 and λ2 = 0.25 across all our experiments.