Few-Shot Adversarial Prompt Learning on Vision-Language Models
Authors: Yiwei Zhou, Xiaobo Xia, Zhiwei Lin, Bo Han, Tongliang Liu
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We justify our claims through a series of experiments on 11 benchmark datasets covering multiple recognition tasks. |
| Researcher Affiliation | Academia | Yiwei Zhou School of Automation Beijing Institute of Technology zhouyiwei@bit.edu.cn Xiaobo Xia Sydney AI Centre University of Sydney xiaoboxia.uni@gmail.com Zhiwei Lin School of Automation Beijing Institute of Technology linzhiwei@bit.edu.cn Bo Han Department of Computer Science Hong Kong Baptist University bhanml@comp.hkbu.edu.hk Tongliang Liu Sydney AI Centre University of Sydney tongliang.liu@sydney.edu.au |
| Pseudocode | Yes | A Pipelines of Adversarial Prompt Learning and Testing For a better understanding of the designed algorithm, we describe our adversarial prompt learning and adversarial prompt testing pipeline in Algorithm 1 and Algorithm 2 respectively. |
| Open Source Code | Yes | Code is available at: https://github.com/lionel-w2/FAP. |
| Open Datasets | Yes | To evaluate the proposed method, we align with previous works [28, 33] and utilize 11 diverse image recognition datasets that span multiple vision tasks. Specifically, the datasets include two generic object datasets: Image Net-1K [20] and Caltech101 [32]; a texture recognition dataset: DTD [34]; five fine-grained object recognition datasets: FGVCAircraft [35], Oxford Pets [36], Flowers102 [37], Food101 [38], and Stanford Cars [39]; a scene recognition dataset: SUN397 [40]; an action recognition dataset: UCF101 [41]; and a satellite image classification dataset: Euro SAT [42]. |
| Dataset Splits | No | The paper uses 'test dataset' and a 'few-shot dataset S' for training, but does not explicitly mention a 'validation set' or 'validation split' for hyperparameter tuning or model selection in its experimental setup details. Training is done for a fixed number of epochs. |
| Hardware Specification | Yes | Experiments of adversarial prompt tuning on the Image Net-1K dataset are carried out on a single NVIDIA RTX A40 GPU, while experiments on the other 10 datasets are performed on a single NVIDIA RTX 4090 GPU. |
| Software Dependencies | Yes | All experiments are conducted in an environment running Py Torch 1.10.1 and CUDA 11.3 on Python 3.8. |
| Experiment Setup | Yes | All models are trained for 5 epochs in cross-dataset evaluation and 10 epochs for other benchmark settings by using an SGD optimizer with a momentum of 0.9. The initial learning rate is set at 0.0035. We apply a cosine learning rate scheduler and a warm-up strategy during the first epoch. For adversarial prompt learning, we use token prompts of size 2 in both the vision and text branches across the first 9 transformer blocks. Attacks are generated under ℓ threat model through a 2-step PGD attack, with a perturbation boundary ϵ = 1/255 and a step size α = 1/255, following the methodologies outlined in [11]. |