Compound Text-Guided Prompt Tuning via Image-Adaptive Cues
Authors: Hao Tan, Jun Li, Yizhuang Zhou, Jun Wan, Zhen Lei, Xiangyu Zhang
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on few-shot recognition and domain generalization demonstrate that TGP-T achieves superior performance with consistently lower training costs. |
| Researcher Affiliation | Collaboration | 1MAIS, Institute of Automation, Chinese Academy of Sciences, Beijing, China 2School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China 3MEGVII Technology 4CAIR, HKISI, Chinese Academy of Sciences, Hong Kong, China {tanhao2023, lijun2021, jun.wan, zhen.lei}@ia.ac.cn, {zhouyizhuang, zhangxiangyu}@megvii.com |
| Pseudocode | No | No explicit pseudocode or algorithm blocks are provided. |
| Open Source Code | Yes | The code is available at https://github.com/Eric Tan7/TGP-T. |
| Open Datasets | Yes | Following CLIP (Radford et al. 2021), we adopt 11 publicly available image classification datasets that cover diverse scenes and scales, including Image Net (Deng et al. 2009), Caltech (Fei-Fei, Fergus, and Perona 2004), Oxford Pets (Parkhi et al. 2012), Flowers (Nilsback and Zisserman 2008), Food101 (Bossard, Guillaumin, and Van Gool 2014), Stanford Cars (Krause et al. 2013), FGVCAircraft (Maji et al. 2013), Euro SAT (Helber et al. 2019), UCF101 (Soomro, Zamir, and Shah 2012), DTD (Cimpoi et al. 2014), and SUN397 (Xiao et al. 2010). |
| Dataset Splits | Yes | We follow the few-shot evaluation protocol in Co Op (Zhou et al. 2022b), i.e., we use 1, 2, 4, 8, and 16 shots for training, respectively, and report results on the full test sets. ... We tune the hyperparameters on a few-shot validation set with min(n, 4) shots (n is the number of training shots) rather than searching on the test set. |
| Hardware Specification | Yes | Note that when using batch size of 8, Co Co Op runs into out-of-memory (OOM) problems on Stanford Cars, SUN397, and Image Net with Nvidia RTX 3090. ... Moreover, TGP-T enables the utilization of more powerful backbones such as Vi T-L/14, while Co Op, Co Co Op, and Ma PLe run into out-of-memory (OOM) problems on Nvidia RTX 3090. |
| Software Dependencies | No | The paper mentions software like 'Adam W optimizer' but does not provide specific version numbers for any software dependencies. |
| Experiment Setup | Yes | We set Vi T-B/16 as the image encoder. The depth of the Bonder is set to 1. The number of category-wise and content-wise prompt queries is 32 and 64, respectively. We adopt the Adam W optimizer (Loshchilov and Hutter 2017) with a learning rate of 5e-5 and a weight decay of 1e-4. The model is trained for 12,800 iterations with a batch size of 8. |