Efficient Adaptation of Large Vision Transformer via Adapter Re-Composing

Authors: Wei Dong, Dawei Yan, Zhijun Lin, Peng Wang

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct experiments on 24 downstream image classification tasks using various Vision Transformer variants to evaluate our method. The results demonstrate that our approach achieves compelling transfer learning performance with a reduced parameter count.
Researcher Affiliation Collaboration Wei Dong1,4 Dawei Yan1 Zhijun Lin2 Peng Wang3 1College of Information and Control Engineering, Xi an University of Architecture and Technology. 2School of Computer Science, Northwestern Polytechnical University. 3School of Computer Science and Engineering, University of Electronic Science and Technology of China. 4Xi an Hypersonic Measurement Technology Co., Ltd.
Pseudocode No No structured pseudocode or algorithm blocks are provided in the paper.
Open Source Code Yes Our code is available at https://github.com/David Yan An De/ARC.
Open Datasets Yes We evaluate the effectiveness of our ARC approach on two sets of visual task adaptation benchmarks, comprising a total of 24 datasets. The list of datasets used for evaluation is provided below: FGVC. ... CUB-200-2011 [31], NABirds [32], Oxford Flowers [33], Stanford Dogs [34], and Stanford Cars [35]. VTAB-1k. We also evaluate our ARC method on the VTAB-1k benchmark [36]...
Dataset Splits Yes Each downstream task in the VTAB-1k benchmark consists of 1000 training examples. Following VPT [6], we set aside 200 samples from the training set as the validation set to select hyperparameters. Subsequently, we train the model on the full training data using the selected hyperparameters.
Hardware Specification Yes All experiments were conducted using the Py Torch [38] framework on an NVIDIA A40 GPU with 48GB of GPU memory.
Software Dependencies No We only found 'Py Torch [38] framework' mentioned without specific version numbers for the software or any other key libraries.
Experiment Setup Yes We have used grid search to select hyper-parameters such as the learning rate, weight decay, and batch size, using the validation set of each task, as in VPT [6]. All experiments were conducted using the Py Torch [38] framework on an NVIDIA A40 GPU with 48GB of GPU memory. Table 8: The implementation details of configurations such as optimizer and hyper-parameters. Learning Rate {0.2, 0.1, 0.05, 0.01, 0.005, 0.001, 0.0001} Weight Decay {0.05, 0.01, 0.005, 0.001, 0} Batch Size {256, 128, 32} Adapter Dropout {0.8, 0.5, 0.1, 0} Learning Rate Schedule Cosine Decay Training Epochs 100 Warmup Epochs 10