reproducibilityindex.ai

Revisiting the Power of Prompt for Visual Tuning

Authors: Yuzhu Wang, Lechao Cheng, Chaowei Fang, Dingwen Zhang, Manni Duan, Meng Wang

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Exhaustive experiments show our proposed approach outperforms existing methods by a remarkable margin.
Researcher Affiliation	Academia	1Zhejiang Lab 2School of Computer Science and Information Engineering, Hefei University of Technology 3School of Artificial Intelligence, Xidian University 4School of Automation, Northwestern Polytechnical University. Correspondence to: Lechao Cheng <chenglc@hfut.edu.cn>.
Pseudocode	No	The paper describes its methods using mathematical equations and textual explanations but does not include structured pseudocode or algorithm blocks.
Open Source Code	Yes	The code is available at https://github.com/Wang YZ1608/Self-Prompt Tuning.
Open Datasets	Yes	Our experiments are carried out on two image classification benchmarks. FGVC contains 5 benchmarked Fine-Grained Visual Classification, including CUB-200-2011 (Wah et al., 2011), NABirds (Van Horn et al., 2015), Oxford Flowers(Nilsback & Zisserman, 2008), Stanford Dogs(Khosla et al., 2011) and Stanford Cars (Gebru et al., 2017).
Dataset Splits	Yes	For the FGVC datasets...we randomly split the training set into train (90%) and val (10%). For VTAB-1k...we apply the 800-200 split of the train/val set.
Hardware Specification	No	The paper discusses model sizes (e.g., 'Vi T-H (632M paramters)') and mentions experiments were conducted, but does not provide specific hardware details such as GPU models, CPU types, or memory used for training.
Software Dependencies	No	The paper mentions optimizers and models (e.g., 'Adam W optimizer', 'Vision Transformers'), but does not specify version numbers for any software dependencies or libraries (e.g., PyTorch, TensorFlow, CUDA).
Experiment Setup	Yes	We employ the Adam W optimizer with a mini-batch size of 32 for a total of 100 epochs (with a linear warm up for the first 10 epochs), and cosine learning rate (Loshchilov & Hutter, 2016) schedule, which gradually decays the learning rate from its initial value to 1e-8. We process images with a randomly resize crop operation to 224 224 resolution and a random horizontal flip for data augmentation.