Parameter-Efficient Multi-Task Model Fusion with Partial Linearization

Authors: Anke Tang, Li Shen, Yong Luo, Yibing Zhan, Han Hu, Bo Du, Yixin Chen, Dacheng Tao

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results demonstrate the capabilities of our proposed partial linearization technique to effectively construct unified multi-task models via the fusion of fine-tuned task vectors. We evaluate performance over an increasing number of tasks and find that our approach outperforms standard parameter-efficient fine-tuning techniques. In this section, we conduct experiments on diverse tasks spanning both vision and natural language domains.
Researcher Affiliation Collaboration 1Wuhan University, China 2JD Explore Academy, China 3Beijing Institute of Technology, China 4Washington University, USA 5Nanyang Technological University, Singapore
Pseudocode Yes Listing 1: Pytorch code to linearize a model. Listing 2: Linearize all the Lo RA modules in a LLM
Open Source Code No No explicit statement about providing open-source code for the described methodology or a direct link to a code repository was found.
Open Datasets Yes For vision tasks, we utilize CLIP (Radford et al., 2021) as our pre-trained model. For finetuning, we employ the CLIP-Vi T-B-16 models on seven image classification tasks derived with the same random seed 42 to initialize the parameter-efficient models. These tasks are Stanford Cars (Krause et al., 2013), DTD (Cimpoi et al., 2014), Euro SAT (Helber et al., 2018), GTSRB (Stallkamp et al., 2012), RESISC45 (Cheng et al., 2017), SUN397 (Xiao et al., 2010), and SVHN (Netzer et al., 2021). For NLP tasks, we utilize Flan-T5 (Chung et al., 2022) as our pre-trained language model. For fine-tuning, we employ the Flan-T5-base models on seven tasks derived from the GLUE benchmark (Wang et al., 2018) with the same random seed 42 to initialize the parameter-efficient models.
Dataset Splits Yes For fine-tuning, we employ the CLIP-Vi T-B-16 models on seven image classification tasks derived with the same random seed 42 to initialize the parameter-efficient models. For NLP tasks, we utilize Flan-T5 (Chung et al., 2022) as our pre-trained language model. For fine-tuning, we employ the Flan-T5-base models on seven tasks derived from the GLUE benchmark (Wang et al., 2018) with the same random seed 42 to initialize the parameter-efficient models. The averaged model is subsequently evaluated on the validation set of each respective task to assess its performance.
Hardware Specification Yes All of our experiments were performed using the same hardware consisting of eight NVIDIA GTX 3090 GPUs with 24GB video memory.
Software Dependencies Yes We used Py Torch 2.0 and Python 3 throughout all experiments.
Experiment Setup Yes Table 2: Hyperparameters for model fine-tuning. We use a batch size of 64 and a learning rate of 1e-5 and a weight decay of 0.1 with a warm-up cosine scheduler for 6000 steps for all downstream tasks. For standard full fine-tuning, a constant learning rate of 1e-5 or 2e-5 was used for 2000 steps with a batch size of 16 and no weight decay.