EMR-Merging: Tuning-Free High-Performance Model Merging

Authors: Chenyu Huang, Peng Ye, Tao Chen, Tong He, Xiangyu Yue, Wanli Ouyang

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We find that EMR-MERGING shows outstanding performance compared to existing merging methods under different classical and newly-established settings, including merging different numbers of vision models (up to 30), NLP models, PEFT models, and multi-modal models.
Researcher Affiliation Collaboration Chenyu Huang1 , Peng Ye1,3 , Tao Chen1 , Tong He2, Xiangyu Yue3, Wanli Ouyang3 1 Fudan University 2 Shanghai AI Laboratory 3 The Chinese University of Hong Kong
Pseudocode Yes We summarize the procedure of EMR-MERGING in Algorithm 1. Algorithm 1 EMR-MERGING Procedure
Open Source Code Yes Our code is available at https://github.com/harveyhuang18/EMR_Merging.
Open Datasets Yes We employ Vi T-B/32 and Vi T-L/14, two variants of CLIP [54] models visual encoders, as the pre-trained models. The performance of each method is evaluated by eight image classification tasks, including SUN397 [83], Cars [35], RESISC45 [10], Euro SAT [27], SVHN [91], GTSRB [65], MNIST [38], and DTD [11].
Dataset Splits No While the paper refers to 'validation data' (e.g., in Table 1 and Table 7), it does not explicitly provide the specific training/validation/test split percentages or sample counts for the datasets used in its experiments. It mentions following settings from other papers but does not detail the splits here.
Hardware Specification No The paper mentions that 'The computations in this research were performed using the CFFF platform of Fudan University' in the Acknowledgement section. However, this is a general platform reference and does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for the experiments.
Software Dependencies No The paper refers to software libraries like Huggingface [79], timm [77], and torchvision [44] through citations. However, it does not provide specific version numbers for these or any other ancillary software components needed to replicate the experiments.
Experiment Setup No The paper states 'We follow the setting from Task Arithmetic [30], Ties-Merging [84], and Ada Merging [85]' (Section 4.1.1) and provides model details (e.g., Vi T-B/32, Ro BERTa-base). However, it does not explicitly list specific hyperparameters (e.g., learning rate, batch size, number of epochs) or detailed training configurations for its experiments.