Revisiting the Power of Prompt for Visual Tuning
Authors: Yuzhu Wang, Lechao Cheng, Chaowei Fang, Dingwen Zhang, Manni Duan, Meng Wang
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Exhaustive experiments show our proposed approach outperforms existing methods by a remarkable margin. |
| Researcher Affiliation | Academia | 1Zhejiang Lab 2School of Computer Science and Information Engineering, Hefei University of Technology 3School of Artificial Intelligence, Xidian University 4School of Automation, Northwestern Polytechnical University. Correspondence to: Lechao Cheng <chenglc@hfut.edu.cn>. |
| Pseudocode | No | The paper describes its methods using mathematical equations and textual explanations but does not include structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | The code is available at https://github.com/Wang YZ1608/Self-Prompt Tuning. |
| Open Datasets | Yes | Our experiments are carried out on two image classification benchmarks. FGVC contains 5 benchmarked Fine-Grained Visual Classification, including CUB-200-2011 (Wah et al., 2011), NABirds (Van Horn et al., 2015), Oxford Flowers(Nilsback & Zisserman, 2008), Stanford Dogs(Khosla et al., 2011) and Stanford Cars (Gebru et al., 2017). |
| Dataset Splits | Yes | For the FGVC datasets...we randomly split the training set into train (90%) and val (10%). For VTAB-1k...we apply the 800-200 split of the train/val set. |
| Hardware Specification | No | The paper discusses model sizes (e.g., 'Vi T-H (632M paramters)') and mentions experiments were conducted, but does not provide specific hardware details such as GPU models, CPU types, or memory used for training. |
| Software Dependencies | No | The paper mentions optimizers and models (e.g., 'Adam W optimizer', 'Vision Transformers'), but does not specify version numbers for any software dependencies or libraries (e.g., PyTorch, TensorFlow, CUDA). |
| Experiment Setup | Yes | We employ the Adam W optimizer with a mini-batch size of 32 for a total of 100 epochs (with a linear warm up for the first 10 epochs), and cosine learning rate (Loshchilov & Hutter, 2016) schedule, which gradually decays the learning rate from its initial value to 1e-8. We process images with a randomly resize crop operation to 224 224 resolution and a random horizontal flip for data augmentation. |