Revisiting the Power of Prompt for Visual Tuning

Authors: Yuzhu Wang, Lechao Cheng, Chaowei Fang, Dingwen Zhang, Manni Duan, Meng Wang

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Exhaustive experiments show our proposed approach outperforms existing methods by a remarkable margin.
Researcher Affiliation Academia 1Zhejiang Lab 2School of Computer Science and Information Engineering, Hefei University of Technology 3School of Artificial Intelligence, Xidian University 4School of Automation, Northwestern Polytechnical University. Correspondence to: Lechao Cheng <chenglc@hfut.edu.cn>.
Pseudocode No The paper describes its methods using mathematical equations and textual explanations but does not include structured pseudocode or algorithm blocks.
Open Source Code Yes The code is available at https://github.com/Wang YZ1608/Self-Prompt Tuning.
Open Datasets Yes Our experiments are carried out on two image classification benchmarks. FGVC contains 5 benchmarked Fine-Grained Visual Classification, including CUB-200-2011 (Wah et al., 2011), NABirds (Van Horn et al., 2015), Oxford Flowers(Nilsback & Zisserman, 2008), Stanford Dogs(Khosla et al., 2011) and Stanford Cars (Gebru et al., 2017).
Dataset Splits Yes For the FGVC datasets...we randomly split the training set into train (90%) and val (10%). For VTAB-1k...we apply the 800-200 split of the train/val set.
Hardware Specification No The paper discusses model sizes (e.g., 'Vi T-H (632M paramters)') and mentions experiments were conducted, but does not provide specific hardware details such as GPU models, CPU types, or memory used for training.
Software Dependencies No The paper mentions optimizers and models (e.g., 'Adam W optimizer', 'Vision Transformers'), but does not specify version numbers for any software dependencies or libraries (e.g., PyTorch, TensorFlow, CUDA).
Experiment Setup Yes We employ the Adam W optimizer with a mini-batch size of 32 for a total of 100 epochs (with a linear warm up for the first 10 epochs), and cosine learning rate (Loshchilov & Hutter, 2016) schedule, which gradually decays the learning rate from its initial value to 1e-8. We process images with a randomly resize crop operation to 224 224 resolution and a random horizontal flip for data augmentation.