SA²VP: Spatially Aligned-and-Adapted Visual Prompt
Authors: Wenjie Pei, Tongqi Xia, Fanglin Chen, Jinsong Li, Jiandong Tian, Guangming Lu
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on three challenging benchmarks for image classification demonstrate the superiority of our model over other state-of-the-art methods for visual prompt tuning. Code is available at https://github.com/tommy-xq/SA2VP. [...] Experiments Experimental Setup Datasets. We conduct experiments on three challenging benchmarks across diverse scenes: FGVC, HTA and VTAB-1k (Zhai et al. 2019). |
| Researcher Affiliation | Collaboration | 1Harbin Institute of Technology, Shenzhen 2Shenzhen Jiang & Associates Creative Design Co., Ltd 3Shenyang Institute of Automation, Chinese Academy of Sciences |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code is available at https://github.com/tommy-xq/SA2VP. |
| Open Datasets | Yes | We conduct experiments on three challenging benchmarks across diverse scenes: FGVC, HTA and VTAB-1k (Zhai et al. 2019). FGVC benchmark contains 5 image datasets, including CUB (Wah et al. 2011), NABirds (Van Horn et al. 2015), Oxford Flowers (Nilsback and Zisserman 2008), Stanford Dogs (Khosla et al. 2011) and Stanford Cars (Gebru et al. 2017). [...] HTA [...] including CIFAR10 (Krizhevsky, Hinton et al. 2009), CIFAR100 (Krizhevsky, Hinton et al. 2009), DTD (Cimpoi et al. 2014), CUB-200 (Wah et al. 2011), NABirds (Van Horn et al. 2015), Stanford-Dogs(Khosla et al. 2011), Oxford-Flowers (Nilsback and Zisserman 2008), Food101 (Bossard, Guillaumin, and Van Gool 2014), GTSRB (Stallkamp et al. 2012) and SVHN (Netzer et al. 2011). |
| Dataset Splits | Yes | We follow VPT to split data for training and test. [...] Using default train-val-test data split, we follow the experimental configuration of DAM-VP (Huang et al. 2023) to have a fair comparison. [...] Each dataset in VTAB-1k contains 1000 training images, among which 800 images are for training and the left is for validation (Zhai et al. 2019). |
| Hardware Specification | No | The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | The paper only mentions the optimizer (Adam W) but does not provide specific software dependencies like programming languages, libraries, or frameworks with their version numbers. |
| Experiment Setup | Yes | Adam W (Loshchilov and Hutter 2017) is used for optimization with the initial learning rate 1e 3, weight decay 1e 4 and the batch size 64 or 128. Since all experiments are performed for image classification on all benchmarks, we use classification accuracy as the evaluation metric. |