Historical Test-time Prompt Tuning for Vision Foundation Models
Authors: Jingyi Zhang, Jiaxing Huang, Xiaoqin Zhang, Ling Shao, Shijian Lu
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments show that His TPT achieves superior prompt tuning performance consistently while handling different visual recognition tasks (e.g., image classification, semantic segmentation, and object detection) and test samples from continuously changing domains. |
| Researcher Affiliation | Collaboration | 1 College of Computing and Data Science, Nanyang Technological University, Singapore 2 College of Computer Science and Technology, Zhejiang University of Technology, China 3 UCAS-Terminus AI Lab, University of Chinese Academy of Sciences, China |
| Pseudocode | Yes | We provide the pseudo codes of the proposed historical test-time prompt tuning (His TPT), as shown in Algorithm 1. |
| Open Source Code | No | Code will be released after being accepted. |
| Open Datasets | Yes | We evaluate His TPT over multiple datasets across three widely studied visual recognition tasks: Semantic Segmentation: We benchmark His TPT over 6 image segmentation datasets with pixelwise annotations, including Cityscapes [16], BDD100K [67], Mapillary [68], ADE20K [69], Pascal Content [70] and ACDC [17]. |
| Dataset Splits | No | The paper mentions 'test samples' and a 'continuous flow' of data but does not explicitly specify traditional train/validation/test splits used for model development or evaluation, nor does it refer to predefined validation splits for the datasets used. |
| Hardware Specification | Yes | All the experiments are conducted on one NVIDIA Tesla V100 GPU with batch size 1. |
| Software Dependencies | No | The paper mentions software components like 'Adam W optimizer' and models like 'SEEM' and 'CLIP' but does not provide specific version numbers for these software dependencies (e.g., PyTorch, TensorFlow, or specific library versions). |
| Experiment Setup | Yes | In training, we employ Adam W optimizer [84] with a weight decay of 0.05, and set the initial learning rate as 0.0001. For all experiments, the prompt is initialized as a photo of a and the corresponding 4 tokens (i.e., M = 4) of dimension D = 512 are optimized as in [7, 8]. Unless otherwise specified, we set the size of the local knowledge bank and hard-sample knowledge bank at L = H = 32 and the number of the selected hard-sample features K at 16. We set the update coefficient γ of the global knowledge bank at 0.99. Following [7], we set the optimization step in test-time prompt tuning at 1 by default. |