Black-box Prompt Tuning for Vision-Language Model as a Service

Authors: Lang Yu, Qin Chen, Jiaju Lin, Liang He

IJCAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results show that our proposed black-box prompt tuning framework outperforms both hand-crafted prompt engineering and gradient-based prompt learning methods, which serves as evidence of its capability to train taskrelevant prompts in a derivative-free manner.
Researcher Affiliation Academia Lang Yu1,2 , Qin Chen1,2 , Jiaju Lin1 and Liang He1,2 1School of Computer Science and Technology, East China Normal University 2Shanghai Institute of AI for Education, East China Normal University {lyu, jiaju lin}@stu.ecnu.edu.cn, {qchen, lhe}@cs.ecnu.edu.cn
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code Yes Code is available at https://github.com/Bruth YU/BPT-VLM
Open Datasets Yes To evaluate the effectiveness of BPT-VLM, we conduct experiments on 9 visual image classification datasets: Image Net [Deng et al., 2009], Caltech101 [Fei-Fei et al., 2004], Oxford Pets [Parkhi et al., 2012], Flowers102 [Nilsback and Zisserman, 2008], Food101 [Bossard et al., 2014], UCF101 [Soomro et al., 2012], SUN397 [Xiao et al., 2010], Euro SAT [Helber et al., 2019] and DTD [Cimpoi et al., 2014].
Dataset Splits Yes Following the few-shot setting adopted in [Zhou et al., 2022], all methods use the same 16-shot split for prompt tuning and are evaluated on full test-sets for comparison.
Hardware Specification No The paper does not explicitly describe the specific hardware (e.g., GPU models, CPU models, memory) used for running the experiments.
Software Dependencies No The paper mentions "Py CMA" and "Py Pop7" as open-source libraries used for implementation, but does not provide specific version numbers for these or other key software components like Python, PyTorch, or CUDA.
Experiment Setup Yes Table 1: Default Setting of Hyper-parameters includes Intrinsic Dimension 1000, Vision Prompt Length 8, Language Prompt Length 5, Population Size 30, and Loss Function Cross Entropy.