Visual Fourier Prompt Tuning
Authors: Runjia Zeng, Cheng Han, Qifan Wang, Chunshu Wu, Tong Geng, Lifu Huangg, Ying Nian Wu, Dongfang Liu
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Comprehensive experiments are conducted to evaluate the performance of VFPT. In 2, we conduct a literature review and discuss relevant works. Our approach is presented in 3, where we describe how we simple yet effectively integrate FFT into visual prompt tuning. In 4.2, we present compelling experimental results on various benchmarks, backbones, and different pretraining objectives, achieving superior performance without complex engineering design. |
| Researcher Affiliation | Collaboration | 1Rochester Institute of Technology 2University of Missouri Kansas City 3Meta AI 4University of Rochester 5UC Davis 6University of California, Los Angeles |
| Pseudocode | No | The paper does not contain any pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our code is avaliable at https://github.com/runtsang/VFPT. |
| Open Datasets | Yes | Datasets. Following common practice [5, 4, 81, 83], our experiments are carried out on two image classification benchmarks. VTAB-1k [78] collects 19 benchmarked Visual Task Adaptation, separated into three groups: (1) Natural includes natural images captured by standard cameras, (2) Specialized consists of images taken by specialized equipment, and (3) Structured considers tasks considering geometric comprehension (i.e., counting, distance), which has substantial dataset disparities (i.e., tasks in Natural and Specialized are closely related to image classification and thus have low disparities, while tasks in Structured are regarded as distinct from image classification) when comparing to the pretrained dataset [9] (i.e., Image Net21K [84]). Each task of VTAB-1k contains 1000 training examples with the 800/200 split for train/val set. FGVC contains 5 benchmarked Fine-Grained Visual Classification, including CUB-200-2011 [85], NABirds [86], Oxford Flowers [87], Stanford Dogs [88] and Stanford Cars [89]. The training set is split into 90% train and 10% val. |
| Dataset Splits | Yes | Each task of VTAB-1k contains 1000 training examples with the 800/200 split for train/val set. [...] The training set is split into 90% train and 10% val. |
| Hardware Specification | Yes | Experiments are conducted on NVIDIA A100-40GB GPUs. |
| Software Dependencies | No | VFPT is implemented in Pytorch [91]. The paper does not specify the version number for Pytorch or any other software dependencies. |
| Experiment Setup | Yes | Training. Following [4, 5], we conduct grid search to find the best tuning hyperparameters, learning rate (i.e., [50, 25, 10, 5, 2.5, 1, 0.5, 0.25, 0.1, 0.05]), and weight decay (i.e., [0.01, 0.001, 0.0001, 0.0]) on val set. Notably, VFPT does not require specific-designed large learning rate in [4]. The learning rate is scheduled by a cosine decay policy and trained for 100 epochs. |