reproducibilityindex.ai

Improving Visual Prompt Tuning for Self-supervised Vision Transformers

Authors: Seungryong Yoo, Eunji Kim, Dahuin Jung, Jungbeom Lee, Sungroh Yoon

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through empirical observations, we deduce that the effectiveness of VPT hinges largely on the Vi T blocks with which the prompt tokens interact. Specifically, VPT shows improved performance on image classification tasks for MAE and Mo Co v3 when the prompt tokens are inserted into later blocks rather than the first block. These observations suggest that there exists an optimal location of blocks for the insertion of prompt tokens. Unfortunately, identifying the optimal blocks for prompts within each self-supervised Vi T for diverse future scenarios is a costly process. To mitigate this problem, we propose a simple yet effective method that learns a gate for each Vi T block to adjust its intervention into the prompt tokens. With our method, prompt tokens are selectively influenced by blocks that require steering for task adaptation. Our method outperforms VPT variants in FGVC and VTAB image classification and ADE20K semantic segmentation.
Researcher Affiliation	Academia	1Electrical and Computer Engineering, 2Interdisciplinary Program in Artificial Intelligence, Seoul National University, Seoul, Korea.
Pseudocode	Yes	Algorithm 1 Py Torch-like Pseudocode for Gated Prompt Tuning
Open Source Code	Yes	The code is available at https://github. com/ryongithub/Gated Prompt Tuning.
Open Datasets	Yes	FGVC includes five fine-grained classification tasks: CUB (Wah et al., 2011), Oxford Flowers (Nilsback & Zisserman, 2008), Stanford Cars (Gebru et al., 2017), Stanford Dogs (Khosla et al., 2011) and NABirds (Van Horn et al., 2015). ... VTAB-1K (Zhai et al., 2019), which consists of 19 diverse visual classification tasks... For semantic segmentation, we evaluate the performances on ADE20K (Zhou et al., 2017) benchmark.
Dataset Splits	No	The paper uses standard benchmarks (FGVC, VTAB-1K, ADE20K) which typically have predefined splits, but it does not explicitly detail training, validation, and test splits with specific percentages, counts, or methodologies (e.g.,
Hardware Specification	No	No specific hardware details (e.g., GPU models, CPU models, memory specifications) are mentioned in the paper. It discusses computational efficiency and parameters but not the machines used.
Software Dependencies	No	The paper provides
Experiment Setup	Yes	The hyperparameters used to train the models for FGVC (Table 1), VTAB-1K (Zhai et al., 2019) (Table 2), and ADE20K (Table 3) are listed in Table 5. We used the SGD optimizer, and the learning rate was searched among {0.05, 0.1, 0.25, 0.5, 1.0, 2.5, 5.0}. For ADE20K semantic segmentation, we used the default hyperparameters following SETR-PUP (Zheng et al., 2021). ... Table 5. Selected hyper-parameters of our method for each downstream task and SSL method.