Concept-Guided Prompt Learning for Generalization in Vision-Language Models
Authors: Yi Zhang, Ce Zhang, Ke Yu, Yushun Tang, Zhihai He
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experimental results demonstrate that our CPL method significantly improves generalization capabilities compared to the current state-of-the-art methods. |
| Researcher Affiliation | Academia | 1Harbin Institute of Technology 2Southern University of Science and Technology 3Carnegie Mellon University 4Pengcheng Laboratory |
| Pseudocode | No | The paper includes diagrams to illustrate the proposed method but does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code will be available at https://github.com/rambo-coder/CPL. |
| Open Datasets | Yes | For base-to-novel generalization, cross-dataset transfer tasks, we follow previous work (Radford et al. 2021; Zhou et al. 2022b,a) to conduct the experiments on 11 representative image classification datasets, including Image Net (Deng et al. 2009) and Caltech101 (Fei Fei, Fergus, and Perona 2004) for generic object classification; Oxford Pets (Parkhi et al. 2012), Stanford Cars (Krause et al. 2013), Flowers102 (Nilsback and Zisserman 2008), Food101 (Bossard, Guillaumin, and Van Gool 2014), and FGVCAircraft (Maji et al. 2013) for fine-grained classification; SUN397 (Xiao et al. 2010) for scene recognition; UCF101 (Soomro, Zamir, and Shah 2012) for action recognition; DTD (Cimpoi et al. 2014) for texture classification; and Euro SAT (Helber et al. 2019) for satellite image recognition. |
| Dataset Splits | No | The paper mentions training sets and test sets, and specifies training epochs and few-shot settings, but does not explicitly define or refer to a distinct validation dataset split used for hyperparameter tuning. |
| Hardware Specification | Yes | We employ the Adam W optimizer with a cosine annealing scheduler and train the models on a single NVIDIA RTX 3090 GPU. |
| Software Dependencies | No | The paper mentions models like CLIP and ResNet-50, and optimizers like AdamW, but does not provide specific version numbers for software dependencies such as Python, PyTorch, or CUDA libraries. |
| Experiment Setup | Yes | We conduct training for 70 epochs on the Image Net and 50 epochs for other datasets. We designate the number of concepts K as 10. Training involves a batch size of 256 and an initial learning rate set at 10 3. We employ the Adam W optimizer with a cosine annealing scheduler and train the models on a single NVIDIA RTX 3090 GPU. |