reproducibilityindex.ai

Twin-Merging: Dynamic Integration of Modular Expertise in Model Merging

Authors: Zhenyi Lu, Chenghao Fan, Wei Wei, Xiaoye Qu, Dangyang Chen, Yu Cheng

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on 20 datasets for both language and vision tasks demonstrate the effectiveness of our method, showing an average improvement of 28.34% in absolute normalized score for discriminative tasks and even surpassing the fine-tuned upper bound on the generative tasks.
Researcher Affiliation	Collaboration	1 School of Computer Science & Technology, Huazhong University of Science and Technology, 2 Joint Laboratory of HUST and Pingan Property & Casualty Research (HPL), 3 Ping An Property & Casualty Insurance Company of China, Ltd., 4 The Chinese University of Hong Kong.
Pseudocode	Yes	Algorithm 1 Twin-Merging
Open Source Code	Yes	1Our implementation is available in https://github.com/LZY-the-boys/Twin-Merging
Open Datasets	Yes	For language discriminative tasks, following [76, 79], we use Ro BERTa [42] as the backbone and evaluate on the 8-task GLUE benchmark [69]... The licenses of QNLI, COLA, and STS-B are licensed under CC-BY-SA. QQP is licensed under MIT. SST-2 and MRPC are licensed under Apache 2.0. MNLI is licensed under OANC. RTE is licensed under CC BY 4.0. Thus, these datasets in GLUE are available for non-commercial research purposes.
Dataset Splits	Yes	We split 10% of the training set as a validation set and employ the original validation data as the test set.
Hardware Specification	Yes	We executed all our experiments on Nvidia A100 GPUs equipped with 80GB RAM.
Software Dependencies	No	The paper mentions frameworks and models like RoBERTa, Qwen-14B, and LoRA but does not specify version numbers for general software dependencies or libraries such as Python, PyTorch, or CUDA.
Experiment Setup	Yes	Our selected hyperparameters included a batch size of 64 and a learning rate set at 1e 5. For generative tasks, the fine-tuning process for Qwen-14B involved the utilization of Lo RA with a rank set to 32, a batch size of 128, and a learning rate of 2e 4 for 3 epochs.