EMR-Merging: Tuning-Free High-Performance Model Merging
Authors: Chenyu Huang, Peng Ye, Tao Chen, Tong He, Xiangyu Yue, Wanli Ouyang
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We find that EMR-MERGING shows outstanding performance compared to existing merging methods under different classical and newly-established settings, including merging different numbers of vision models (up to 30), NLP models, PEFT models, and multi-modal models. |
| Researcher Affiliation | Collaboration | Chenyu Huang1 , Peng Ye1,3 , Tao Chen1 , Tong He2, Xiangyu Yue3, Wanli Ouyang3 1 Fudan University 2 Shanghai AI Laboratory 3 The Chinese University of Hong Kong |
| Pseudocode | Yes | We summarize the procedure of EMR-MERGING in Algorithm 1. Algorithm 1 EMR-MERGING Procedure |
| Open Source Code | Yes | Our code is available at https://github.com/harveyhuang18/EMR_Merging. |
| Open Datasets | Yes | We employ Vi T-B/32 and Vi T-L/14, two variants of CLIP [54] models visual encoders, as the pre-trained models. The performance of each method is evaluated by eight image classification tasks, including SUN397 [83], Cars [35], RESISC45 [10], Euro SAT [27], SVHN [91], GTSRB [65], MNIST [38], and DTD [11]. |
| Dataset Splits | No | While the paper refers to 'validation data' (e.g., in Table 1 and Table 7), it does not explicitly provide the specific training/validation/test split percentages or sample counts for the datasets used in its experiments. It mentions following settings from other papers but does not detail the splits here. |
| Hardware Specification | No | The paper mentions that 'The computations in this research were performed using the CFFF platform of Fudan University' in the Acknowledgement section. However, this is a general platform reference and does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for the experiments. |
| Software Dependencies | No | The paper refers to software libraries like Huggingface [79], timm [77], and torchvision [44] through citations. However, it does not provide specific version numbers for these or any other ancillary software components needed to replicate the experiments. |
| Experiment Setup | No | The paper states 'We follow the setting from Task Arithmetic [30], Ties-Merging [84], and Ada Merging [85]' (Section 4.1.1) and provides model details (e.g., Vi T-B/32, Ro BERTa-base). However, it does not explicitly list specific hyperparameters (e.g., learning rate, batch size, number of epochs) or detailed training configurations for its experiments. |