Parameter-Efficient Model Adaptation for Vision Transformers
Authors: Xuehai He, Chunyuan Li, Pengchuan Zhang, Jianwei Yang, Xin Eric Wang
AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct an empirical study on each efficient model adaptation method focusing on its performance alongside parameter cost. Furthermore, we propose a parameter-efficient model adaptation framework, which first selects submodules by measuring local intrinsic dimensions and then projects them into subspace for further decomposition via a novel Kronecker Adaptation (KAdaptation) method. We experiment on 20 datasets under the few-shot setting and 7 image classification datasets under the full-shot setting. |
| Researcher Affiliation | Collaboration | Xuehai He1, Chuanyuan Li2, Pengchuan Zhang2, Jianwei Yang2, Xin Eric Wang1 1 UC Santa Cruz, 2Microsoft Research at Redmond |
| Pseudocode | No | No pseudocode or clearly labeled algorithm blocks are present in the paper. |
| Open Source Code | Yes | To facilitate future research, implementations of all the methods studied in this work are released at https://github.com/eric-ailab/PEVi T. |
| Open Datasets | Yes | For few-shot benchmark experiments, we conduct experiments on 20 image classification datasets from the ELEVATER benchmark (Li et al. 2022b)... For full-shot experiments, we summarize the results by computing the average performance on CIFAR10 (Krizhevsky and Hinton 2009), CIFAR100 (Krizhevsky and Hinton 2009), SUN397 (Xiao et al. 2010), DTD (Cimpoi et al. 2014), STL10 (Coates, Ng, and Lee 2011), FGVCAircraft (Maji et al. 2013), and FER2013 (Goodfellow et al. 2013). |
| Dataset Splits | Yes | We use the official split for each of these datasets. |
| Hardware Specification | Yes | For few-shot benchmark experiments, we conduct experiments on 20 image classification datasets from the ELEVATER benchmark (Li et al. 2022b) on four Quadro RTX A6000 GPUs. |
| Software Dependencies | No | The paper mentions optimizers like SGD and AdamW and notes automatic hyper-parameter tuning, but it does not specify versions for any programming languages or libraries (e.g., Python, PyTorch, TensorFlow). |
| Experiment Setup | Yes | For benchmark experiments, we use the SGD (Ruder 2016) optimizer with the learning rate and weight decay being automatically searched for all methods so that these two hyperparameters have the optimum combination. Training epochs are set via grid search. For intrinsic dimension experiments, we use the Adam W (Kingma and Ba 2014) as the optimizer, with the weight decay of 10 8, learning rate of 10 5, and batch size of 32 following the setting in Li et al. (2018). |