reproducibilityindex.ai

Merging Multi-Task Models via Weight-Ensembling Mixture of Experts

Authors: Anke Tang, Li Shen, Yong Luo, Nan Yin, Lefei Zhang, Dacheng Tao

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct the conventional multi-task model merging experiments and evaluate the generalization and robustness of our method. The results demonstrate the effectiveness and provide a comprehensive understanding of our method.
Researcher Affiliation	Collaboration	1National Engineering Research Center for Multimedia Software, School of Computer Science, Wuhan University, China 2Hubei Luojia Laboratory, Wuhan, China 3Sun Yat-sen University, Shenzhen, China 4JD Explore Academy, China 5Mohamed bin Zayed University of Artificial Intelligence, United Arab Emirates 6Nanyang Technological University, Singapore.
Pseudocode	No	The paper describes the mathematical representation of its components and modules but does not present a formal pseudocode block or algorithm.
Open Source Code	Yes	The code is available at https://github.com/tanganke/weight-ensembling_Mo E.
Open Datasets	Yes	We fine-tune the models on eight distinct image classification tasks, namely SUN397 (Xiao et al., 2010), Stanford Cars (Krause et al., 2013), RESISC45 (Cheng et al., 2017), Euro SAT (Helber et al., 2018), SVHN (Netzer et al., 2021), GTSRB (Stallkamp et al., 2012), MNIST (Lecun et al., 1998), and DTD (Cimpoi et al., 2014).
Dataset Splits	No	The paper mentions that a hyperparameter λ is chosen based on the model’s performance on a validation set, but it does not specify the size or exact split of this validation set.
Hardware Specification	No	The paper does not provide any specific details about the hardware (e.g., GPU models, CPU types, memory) used for running the experiments.
Software Dependencies	No	The paper does not list specific software dependencies with version numbers (e.g., Python version, library versions) used for the experiments.
Experiment Setup	Yes	For all methods, unless explicitly specified, we follow the configuration in (Yang et al., 2023) and initialize the scaling coefficient of the task vector, denoted as λ, to 0.3. In Figure 4a, we merge CLIP-Vi T-B/32 models with different learning rate configurations.