Dynamic Tuning Towards Parameter and Inference Efficiency for ViT Adaptation
Authors: Wangbo Zhao, Jiasheng Tang, Yizeng Han, Yibing Song, Kai Wang, Gao Huang, Fan Wang, Yang You
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Comprehensive evaluations for Dy T are conducted from multiple perspectives. In the image domain, as shown in Figure 1(b). Dy T surpasses existing PEFT methods while consuming only 71% of the Vi T-B FLOPs on the VTAB-1K benchmark [89]. When visual tokens are scaled up from images to videos, our Dy T shows superior generalization ability on action recognition benchmarks, e.g. K400 [10] and SSv2 [25], with a reduction of 37GFLOPs. In the scenario where labels are scaled up from recognition to segmentation, our Dy T even outperforms full tuning on ADE20K [93] with 21GFLOPs reduction. |
| Researcher Affiliation | Collaboration | 1National University of Singapore 2DAMO Academy, Alibaba Group 3Hupan Laboratory 4Tsinghua University |
| Pseudocode | No | The paper describes its methods using text and mathematical equations, and diagrams (e.g., Figure 2), but does not contain a formal pseudocode or algorithm block. |
| Open Source Code | Yes | Code: https://github.com/NUS-HPC-AI-Lab/Dynamic-Tuning |
| Open Datasets | Yes | Datasets. To evaluate the adaptation performance, we conduct experiments on VTAB-1K [89] benchmark. [...] we also conduct experiments on three image classification datasets with complete training sets, including CIFAR-100 [41], SVHN [24], Food-101 [6]. Additionally, we adopt two video datasets, Kinetic-400 (K400) [10] and Something-Something V2 (SSv2) [25]... For the dense prediction task, we evaluate our method on two widely recognized semantic segmentation datasets, AED20K [93] and COCO-stuff [7]. |
| Dataset Splits | No | The paper mentions various datasets for training and testing but does not explicitly describe a validation dataset split or how it's used. |
| Hardware Specification | Yes | Here, we adopt two GPUs (Tesla V100 and Tesla T4) and a CPU Xeon(R) Platinum 8163 to comprehensively evaluate the efficiency of our methods and three representative PEFT methods |
| Software Dependencies | No | The paper does not explicitly specify the version numbers of software dependencies (e.g., Python, PyTorch, CUDA) used for the experiments. |
| Experiment Setup | Yes | Detailed hyperparameters for each experiment can be found in the Appendix A.8. For instance, in Table 14: Optimizer Adam W [55], Base learning rate 1e-3, Weight decay 0.01, Batch size 1024, Training crop size 224, Learning rate schedule Cosine decay [54], GPU numbers 8, Warmup epochs 20, Training epochs 100. |