Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Black-box Prompt Tuning for Vision-Language Model as a Service
Authors: Lang Yu, Qin Chen, Jiaju Lin, Liang He
IJCAI 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results show that our proposed black-box prompt tuning framework outperforms both hand-crafted prompt engineering and gradient-based prompt learning methods, which serves as evidence of its capability to train taskrelevant prompts in a derivative-free manner. |
| Researcher Affiliation | Academia | Lang Yu1,2 , Qin Chen1,2 , Jiaju Lin1 and Liang He1,2 1School of Computer Science and Technology, East China Normal University 2Shanghai Institute of AI for Education, East China Normal University EMAIL, EMAIL |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code is available at https://github.com/Bruth YU/BPT-VLM |
| Open Datasets | Yes | To evaluate the effectiveness of BPT-VLM, we conduct experiments on 9 visual image classification datasets: Image Net [Deng et al., 2009], Caltech101 [Fei-Fei et al., 2004], Oxford Pets [Parkhi et al., 2012], Flowers102 [Nilsback and Zisserman, 2008], Food101 [Bossard et al., 2014], UCF101 [Soomro et al., 2012], SUN397 [Xiao et al., 2010], Euro SAT [Helber et al., 2019] and DTD [Cimpoi et al., 2014]. |
| Dataset Splits | Yes | Following the few-shot setting adopted in [Zhou et al., 2022], all methods use the same 16-shot split for prompt tuning and are evaluated on full test-sets for comparison. |
| Hardware Specification | No | The paper does not explicitly describe the specific hardware (e.g., GPU models, CPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions "Py CMA" and "Py Pop7" as open-source libraries used for implementation, but does not provide specific version numbers for these or other key software components like Python, PyTorch, or CUDA. |
| Experiment Setup | Yes | Table 1: Default Setting of Hyper-parameters includes Intrinsic Dimension 1000, Vision Prompt Length 8, Language Prompt Length 5, Population Size 30, and Loss Function Cross Entropy. |