Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
RobustMerge: Parameter-Efficient Model Merging for MLLMs with Direction Robustness
Authors: Fanhu Zeng, Haiyang Guo, Fei Zhu, Li Shen, Hao Tang
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct experiments on a benchmark consisting of diverse multimodal tasks, on which we conduct experiments to certify the outstanding performance and generalizability of our method. Additional studies and extensive analyses further showcase the effectiveness. |
| Researcher Affiliation | Academia | 1MAIS, Institute of Automation, Chinese Academy of Sciences 2Centre for Artificial Intelligence and Robotics, HKISI-CAS 3Shenzhen Campus of Sun Yat-sen University 4Shenzhen Loop Area Institute 5State Key Laboratory of Multimedia Information Processing, School of Computer Science, Peking University |
| Pseudocode | Yes | Algorithm 1 Procedure of parameter-efficient merging with complementary parameter adaptation. Input: Fine-tuned models {An,Bn}N n=1, pruning rate k, rank r and λ Output: Merged Parameter-Efficient Model W Step 1: Pruning and Complementary Parameter Scaling. |
| Open Source Code | Yes | Code is available at https://github.com/Aurora Zengfh/Robust Merge. |
| Open Datasets | Yes | For multimodal task merging, we establish a Multi Modal Merging Benchmark (MM-Merge Bench), which comprises eight multimodal generative tasks including Science QA [36], Image Net [5], VQAv2 [7], REC-COCO [22, 39], OCRVQA [42], Flickr30k [44], Viz Wiz-caption [12], Icon QA [38]. To demonstrate the generalizability on unseen tasks, we evaluate the merged models on four diverse datasets, Image Net-R [15], AOKVQA [48], Screen2Word [58], Tab MWP [37]. |
| Dataset Splits | Yes | For training, we follow the standard training procedure described in LLa VA, i.e., training each task individually and obtaining parameter-efficient modules. |
| Hardware Specification | Yes | All merging experiments are carried out on a single NVIDIA A6000 with the temperature set to 0. |
| Software Dependencies | No | The paper mentions LLa VA and CLIP as foundational models, and versions LLaVA 3 and CLIP 4, but does not specify software dependencies with version numbers like Python, PyTorch, or CUDA versions. |
| Experiment Setup | Yes | Unless otherwise stated, all models are trained with a rank of 16. More details can be found in Appendix C. [...] For training, we follow the standard training procedure described in LLa VA, i.e., training each task individually and obtaining parameter-efficient modules. Lo RA is added to linear layers in foundational blocks, and all models are trained for 1 epoch for merging. [...] Hyper-parameter λ is set to 2 by default. |