Consistency-guided Prompt Learning for Vision-Language Models

Authors: Shuvendu Roy, Ali Etemad

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments show that Co Prompt outperforms existing methods on a range of evaluation suites, including base-to-novel generalization, domain generalization, and cross-dataset evaluation.
Researcher Affiliation Academia Shuvendu Roy, Ali Etemad Queen s University, Canada {shuvendu.roy, ali.etemad}@queensu.ca
Pseudocode No No explicitly labeled pseudocode or algorithm blocks were found in the paper.
Open Source Code Yes We make our code available at https://github.com/Shuvendu Roy/Co Prompt.
Open Datasets Yes We evaluate our model s performance on 11 datasets that cover various recognition tasks, including generic object classification datasets such as Image Net (Deng et al., 2009) and Caltech101 (Fei-Fei et al., 2004), fine-grained recognition datasets such as Oxford Pets (Parkhi et al., 2012), Stanford Cars (Krause et al., 2013), Flowers102 (Nilsback & Zisserman, 2008), Food101 (Bossard et al., 2014), and FGVCAircraft (Maji et al., 2013), as well as scene recognition dataset SUN397 (Xiao et al., 2010), action recognition dataset UCF101 (Soomro et al., 2012), satellite-image classification dataset Euro SAT (Helber et al., 2019), and texture recognition dataset DTD (Cimpoi et al., 2014).
Dataset Splits No The paper states 'We fine-tune the model in few-shot settings with 16 samples per class for all known classes' and evaluates on 'base (few-shot performance) and novel (zero-shot performance) categories'. While it references existing protocols ('We follow the experiment setup and protocols established in Co Op (Zhou et al., 2022a) and subsequent works...'), it does not explicitly provide specific percentages or counts for a distinct validation split within the main text.
Hardware Specification Yes The training is conducted on a single Nvidia V100 GPU
Software Dependencies No The paper mentions using CLIP (Vi T-B/16) and LLMs like GPT, GPT-2, and GPT-3, but does not provide specific version numbers for any ancillary software such as programming languages, deep learning frameworks, or libraries (e.g., PyTorch, TensorFlow, Python version).
Experiment Setup Yes We fine-tune the model in few-shot settings with 16 samples per class for all known classes. The model is trained with an SGD optimizer for 8 epochs, using a batch size of 4 and a learning rate of 0.035.