Representation Surgery for Multi-Task Model Merging
Authors: Enneng Yang, Li Shen, Zhenyi Wang, Guibing Guo, Xiaojun Chen, Xingwei Wang, Dacheng Tao
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate significant MTL performance improvements when our Surgery module is applied to state-of-the-art (SOTA) model merging schemes. The code is available at https://github.com/ Enneng Yang/Representation Surgery. We conduct extensive experiments on eight tasks and three architectures, and the results show that when our Surgery module is applied to several advanced model merging schemes, the performance of the merged MTL model can be significantly improved. |
| Researcher Affiliation | Collaboration | 1Northeastern University, China. 2Sun Yat-sen University, China. 3JD Explore Academy, China. 4University of Maryland, USA. 5Shenzhen University, China. 6Nanyang Technological University, Singapore. |
| Pseudocode | No | The paper describes its method using equations and textual explanations, but it does not include any clearly labeled 'Pseudocode' or 'Algorithm' blocks or figures. |
| Open Source Code | Yes | The code is available at https://github.com/ Enneng Yang/Representation Surgery. |
| Open Datasets | Yes | Following the setup of Task Arithmetic (Ilharco et al., 2023), Ties-Merging (Yadav et al., 2023) and Ada Merging (Yang et al., 2024), we treat the following eight datasets as eight tasks to perform model merging: SUN397 (Xiao et al., 2016), Cars (Krause et al., 2013), RESISC45 (Cheng et al., 2017), Euro SAT (Helber et al., 2019), SVHN (Yuval, 2011), GTSRB (Stallkamp et al., 2011), MNIST (Le Cun, 1998), DTD (Cimpoi et al., 2014). |
| Dataset Splits | Yes | The images of each class are divided roughly 1:1 into training set and test set. (Cars dataset description in Appendix A.1) and It contains 60,000 training images and 10,000 test images, each of which is 28x28 pixels. (MNIST dataset description in Appendix A.1) |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running experiments, such as GPU models, CPU types, or cloud computing resources. |
| Software Dependencies | No | The paper mentions the use of 'Adam optimizer' and specific pre-trained models like 'Vi T architectures from CLIP s visual encoder', but it does not specify version numbers for any software libraries, frameworks (e.g., PyTorch, TensorFlow), or programming languages used. |
| Experiment Setup | Yes | Specifically, we use the Adam optimizer (Kingma & Ba, 2014) to update these parameters with a learning rate of 1e-3 and a momentum of (0.9, 0.999). We update for 1,000 iterations with a batch size of 16. In addition, we set the rank (i.e., r in Eq. 3) of the surgery module to 16 by default, and we also tried values such as {4, 8, 16, 32, 64} in the experimental analysis. |