Parameter and Computation Efficient Transfer Learning for Vision-Language Pre-trained Models

Authors: Qiong Wu, Wei Yu, Yiyi Zhou, Shubin Huang, Xiaoshuai Sun, Rongrong Ji

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To validate DAS, we apply it to a bunch of representative VLP models, and conduct extensive experiments on a set of VL tasks. The experimental results not only show the great advantages of DAS in reducing computational complexity, e.g. 11.97% FLOPs of METER on VQA2.0, but also confirm its competitiveness against existing PETL methods in terms of parameter scale and performance.
Researcher Affiliation Academia 1 Key Laboratory of Multimedia Trusted Perception and Efficient Computing, Ministry of Education of China, Xiamen University, 361005, P.R. China. 2 Institute of Artificial Intelligence, Xiamen University, 361005, P.R. China.
Pseudocode Yes Algorithm 1 Dynamic Architecture Skipping
Open Source Code Yes Our source code is given in https://github. com/Doubted Steam/DAS.
Open Datasets Yes To validate DAS, we apply it to a set of VLP models, namely including [10], Vi LT [28] and La VIN [42] 2, on three VL benchmarks, namely VQA2.0 [14], NLVR2 [57] and Flickr30K [51].
Dataset Splits Yes We conduct experiments on VQA2.0 [14]. Instead of answering the question in open-ended natural language, it is converted into a classification task with 3, 129 classes. Following the previous setting [10, 28], the PETL methods and DAS are trained on the train and validation sets of VQA2.0, and we report the test-dev results from the online evaluation 4. Notably, the validation set is used during training for all methods.
Hardware Specification Yes We conduct all experiments with a single NVIDIA Tesla A100 GPU and the settings not mentioned are the same as Vi LT [28] and METER [10].
Software Dependencies No The paper does not provide specific version numbers for software dependencies such as Python, PyTorch, or CUDA.
Experiment Setup Yes Following the most conventional setting [17, 59], the width of hidden states in adapters is set to 96. And the hidden dimension of the adapter used for the skip connection is set to 192 to retain a certain capacity. The VLP model is first warmed up for one epoch. In this epoch, the subnetwork is randomly sampled according to the skipped number m. Then the search runs 2 epochs and the redundancy observation is executed at 10-th step per interval. Finally, the optimal architecture will be trained for another 10 epochs.