Dynamic Tuning Towards Parameter and Inference Efficiency for ViT Adaptation

Authors: Wangbo Zhao, Jiasheng Tang, Yizeng Han, Yibing Song, Kai Wang, Gao Huang, Fan Wang, Yang You

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Comprehensive evaluations for Dy T are conducted from multiple perspectives. In the image domain, as shown in Figure 1(b). Dy T surpasses existing PEFT methods while consuming only 71% of the Vi T-B FLOPs on the VTAB-1K benchmark [89]. When visual tokens are scaled up from images to videos, our Dy T shows superior generalization ability on action recognition benchmarks, e.g. K400 [10] and SSv2 [25], with a reduction of 37GFLOPs. In the scenario where labels are scaled up from recognition to segmentation, our Dy T even outperforms full tuning on ADE20K [93] with 21GFLOPs reduction.
Researcher Affiliation Collaboration 1National University of Singapore 2DAMO Academy, Alibaba Group 3Hupan Laboratory 4Tsinghua University
Pseudocode No The paper describes its methods using text and mathematical equations, and diagrams (e.g., Figure 2), but does not contain a formal pseudocode or algorithm block.
Open Source Code Yes Code: https://github.com/NUS-HPC-AI-Lab/Dynamic-Tuning
Open Datasets Yes Datasets. To evaluate the adaptation performance, we conduct experiments on VTAB-1K [89] benchmark. [...] we also conduct experiments on three image classification datasets with complete training sets, including CIFAR-100 [41], SVHN [24], Food-101 [6]. Additionally, we adopt two video datasets, Kinetic-400 (K400) [10] and Something-Something V2 (SSv2) [25]... For the dense prediction task, we evaluate our method on two widely recognized semantic segmentation datasets, AED20K [93] and COCO-stuff [7].
Dataset Splits No The paper mentions various datasets for training and testing but does not explicitly describe a validation dataset split or how it's used.
Hardware Specification Yes Here, we adopt two GPUs (Tesla V100 and Tesla T4) and a CPU Xeon(R) Platinum 8163 to comprehensively evaluate the efficiency of our methods and three representative PEFT methods
Software Dependencies No The paper does not explicitly specify the version numbers of software dependencies (e.g., Python, PyTorch, CUDA) used for the experiments.
Experiment Setup Yes Detailed hyperparameters for each experiment can be found in the Appendix A.8. For instance, in Table 14: Optimizer Adam W [55], Base learning rate 1e-3, Weight decay 0.01, Batch size 1024, Training crop size 224, Learning rate schedule Cosine decay [54], GPU numbers 8, Warmup epochs 20, Training epochs 100.