Parameter and Computation Efficient Transfer Learning for Vision-Language Pre-trained Models
Authors: Qiong Wu, Wei Yu, Yiyi Zhou, Shubin Huang, Xiaoshuai Sun, Rongrong Ji
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To validate DAS, we apply it to a bunch of representative VLP models, and conduct extensive experiments on a set of VL tasks. The experimental results not only show the great advantages of DAS in reducing computational complexity, e.g. 11.97% FLOPs of METER on VQA2.0, but also confirm its competitiveness against existing PETL methods in terms of parameter scale and performance. |
| Researcher Affiliation | Academia | 1 Key Laboratory of Multimedia Trusted Perception and Efficient Computing, Ministry of Education of China, Xiamen University, 361005, P.R. China. 2 Institute of Artificial Intelligence, Xiamen University, 361005, P.R. China. |
| Pseudocode | Yes | Algorithm 1 Dynamic Architecture Skipping |
| Open Source Code | Yes | Our source code is given in https://github. com/Doubted Steam/DAS. |
| Open Datasets | Yes | To validate DAS, we apply it to a set of VLP models, namely including [10], Vi LT [28] and La VIN [42] 2, on three VL benchmarks, namely VQA2.0 [14], NLVR2 [57] and Flickr30K [51]. |
| Dataset Splits | Yes | We conduct experiments on VQA2.0 [14]. Instead of answering the question in open-ended natural language, it is converted into a classification task with 3, 129 classes. Following the previous setting [10, 28], the PETL methods and DAS are trained on the train and validation sets of VQA2.0, and we report the test-dev results from the online evaluation 4. Notably, the validation set is used during training for all methods. |
| Hardware Specification | Yes | We conduct all experiments with a single NVIDIA Tesla A100 GPU and the settings not mentioned are the same as Vi LT [28] and METER [10]. |
| Software Dependencies | No | The paper does not provide specific version numbers for software dependencies such as Python, PyTorch, or CUDA. |
| Experiment Setup | Yes | Following the most conventional setting [17, 59], the width of hidden states in adapters is set to 96. And the hidden dimension of the adapter used for the skip connection is set to 192 to retain a certain capacity. The VLP model is first warmed up for one epoch. In this epoch, the subnetwork is randomly sampled according to the skipped number m. Then the search runs 2 epochs and the redundancy observation is executed at 10-th step per interval. Finally, the optimal architecture will be trained for another 10 epochs. |