Merging Multi-Task Models via Weight-Ensembling Mixture of Experts
Authors: Anke Tang, Li Shen, Yong Luo, Nan Yin, Lefei Zhang, Dacheng Tao
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct the conventional multi-task model merging experiments and evaluate the generalization and robustness of our method. The results demonstrate the effectiveness and provide a comprehensive understanding of our method. |
| Researcher Affiliation | Collaboration | 1National Engineering Research Center for Multimedia Software, School of Computer Science, Wuhan University, China 2Hubei Luojia Laboratory, Wuhan, China 3Sun Yat-sen University, Shenzhen, China 4JD Explore Academy, China 5Mohamed bin Zayed University of Artificial Intelligence, United Arab Emirates 6Nanyang Technological University, Singapore. |
| Pseudocode | No | The paper describes the mathematical representation of its components and modules but does not present a formal pseudocode block or algorithm. |
| Open Source Code | Yes | The code is available at https://github.com/tanganke/weight-ensembling_Mo E. |
| Open Datasets | Yes | We fine-tune the models on eight distinct image classification tasks, namely SUN397 (Xiao et al., 2010), Stanford Cars (Krause et al., 2013), RESISC45 (Cheng et al., 2017), Euro SAT (Helber et al., 2018), SVHN (Netzer et al., 2021), GTSRB (Stallkamp et al., 2012), MNIST (Lecun et al., 1998), and DTD (Cimpoi et al., 2014). |
| Dataset Splits | No | The paper mentions that a hyperparameter λ is chosen based on the model’s performance on a validation set, but it does not specify the size or exact split of this validation set. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., GPU models, CPU types, memory) used for running the experiments. |
| Software Dependencies | No | The paper does not list specific software dependencies with version numbers (e.g., Python version, library versions) used for the experiments. |
| Experiment Setup | Yes | For all methods, unless explicitly specified, we follow the configuration in (Yang et al., 2023) and initialize the scaling coefficient of the task vector, denoted as λ, to 0.3. In Figure 4a, we merge CLIP-Vi T-B/32 models with different learning rate configurations. |