Twin-Merging: Dynamic Integration of Modular Expertise in Model Merging
Authors: Zhenyi Lu, Chenghao Fan, Wei Wei, Xiaoye Qu, Dangyang Chen, Yu Cheng
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on 20 datasets for both language and vision tasks demonstrate the effectiveness of our method, showing an average improvement of 28.34% in absolute normalized score for discriminative tasks and even surpassing the fine-tuned upper bound on the generative tasks. |
| Researcher Affiliation | Collaboration | 1 School of Computer Science & Technology, Huazhong University of Science and Technology, 2 Joint Laboratory of HUST and Pingan Property & Casualty Research (HPL), 3 Ping An Property & Casualty Insurance Company of China, Ltd., 4 The Chinese University of Hong Kong. |
| Pseudocode | Yes | Algorithm 1 Twin-Merging |
| Open Source Code | Yes | 1Our implementation is available in https://github.com/LZY-the-boys/Twin-Merging |
| Open Datasets | Yes | For language discriminative tasks, following [76, 79], we use Ro BERTa [42] as the backbone and evaluate on the 8-task GLUE benchmark [69]... The licenses of QNLI, COLA, and STS-B are licensed under CC-BY-SA. QQP is licensed under MIT. SST-2 and MRPC are licensed under Apache 2.0. MNLI is licensed under OANC. RTE is licensed under CC BY 4.0. Thus, these datasets in GLUE are available for non-commercial research purposes. |
| Dataset Splits | Yes | We split 10% of the training set as a validation set and employ the original validation data as the test set. |
| Hardware Specification | Yes | We executed all our experiments on Nvidia A100 GPUs equipped with 80GB RAM. |
| Software Dependencies | No | The paper mentions frameworks and models like RoBERTa, Qwen-14B, and LoRA but does not specify version numbers for general software dependencies or libraries such as Python, PyTorch, or CUDA. |
| Experiment Setup | Yes | Our selected hyperparameters included a batch size of 64 and a learning rate set at 1e 5. For generative tasks, the fine-tuning process for Qwen-14B involved the utilization of Lo RA with a rank set to 32, a batch size of 128, and a learning rate of 2e 4 for 3 epochs. |