reproducibilityindex.ai

Concept-Guided Prompt Learning for Generalization in Vision-Language Models

Authors: Yi Zhang, Ce Zhang, Ke Yu, Yushun Tang, Zhihai He

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experimental results demonstrate that our CPL method significantly improves generalization capabilities compared to the current state-of-the-art methods.
Researcher Affiliation	Academia	1Harbin Institute of Technology 2Southern University of Science and Technology 3Carnegie Mellon University 4Pengcheng Laboratory
Pseudocode	No	The paper includes diagrams to illustrate the proposed method but does not contain structured pseudocode or algorithm blocks.
Open Source Code	Yes	Code will be available at https://github.com/rambo-coder/CPL.
Open Datasets	Yes	For base-to-novel generalization, cross-dataset transfer tasks, we follow previous work (Radford et al. 2021; Zhou et al. 2022b,a) to conduct the experiments on 11 representative image classification datasets, including Image Net (Deng et al. 2009) and Caltech101 (Fei Fei, Fergus, and Perona 2004) for generic object classification; Oxford Pets (Parkhi et al. 2012), Stanford Cars (Krause et al. 2013), Flowers102 (Nilsback and Zisserman 2008), Food101 (Bossard, Guillaumin, and Van Gool 2014), and FGVCAircraft (Maji et al. 2013) for fine-grained classification; SUN397 (Xiao et al. 2010) for scene recognition; UCF101 (Soomro, Zamir, and Shah 2012) for action recognition; DTD (Cimpoi et al. 2014) for texture classification; and Euro SAT (Helber et al. 2019) for satellite image recognition.
Dataset Splits	No	The paper mentions training sets and test sets, and specifies training epochs and few-shot settings, but does not explicitly define or refer to a distinct validation dataset split used for hyperparameter tuning.
Hardware Specification	Yes	We employ the Adam W optimizer with a cosine annealing scheduler and train the models on a single NVIDIA RTX 3090 GPU.
Software Dependencies	No	The paper mentions models like CLIP and ResNet-50, and optimizers like AdamW, but does not provide specific version numbers for software dependencies such as Python, PyTorch, or CUDA libraries.
Experiment Setup	Yes	We conduct training for 70 epochs on the Image Net and 50 epochs for other datasets. We designate the number of concepts K as 10. Training involves a batch size of 256 and an initial learning rate set at 10 3. We employ the Adam W optimizer with a cosine annealing scheduler and train the models on a single NVIDIA RTX 3090 GPU.