Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

RobustMerge: Parameter-Efficient Model Merging for MLLMs with Direction Robustness

Authors: Fanhu Zeng, Haiyang Guo, Fei Zhu, Li Shen, Hao Tang

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct experiments on a benchmark consisting of diverse multimodal tasks, on which we conduct experiments to certify the outstanding performance and generalizability of our method. Additional studies and extensive analyses further showcase the effectiveness.
Researcher Affiliation	Academia	1MAIS, Institute of Automation, Chinese Academy of Sciences 2Centre for Artificial Intelligence and Robotics, HKISI-CAS 3Shenzhen Campus of Sun Yat-sen University 4Shenzhen Loop Area Institute 5State Key Laboratory of Multimedia Information Processing, School of Computer Science, Peking University
Pseudocode	Yes	Algorithm 1 Procedure of parameter-efficient merging with complementary parameter adaptation. Input: Fine-tuned models {An,Bn}N n=1, pruning rate k, rank r and λ Output: Merged Parameter-Efficient Model W Step 1: Pruning and Complementary Parameter Scaling.
Open Source Code	Yes	Code is available at https://github.com/Aurora Zengfh/Robust Merge.
Open Datasets	Yes	For multimodal task merging, we establish a Multi Modal Merging Benchmark (MM-Merge Bench), which comprises eight multimodal generative tasks including Science QA [36], Image Net [5], VQAv2 [7], REC-COCO [22, 39], OCRVQA [42], Flickr30k [44], Viz Wiz-caption [12], Icon QA [38]. To demonstrate the generalizability on unseen tasks, we evaluate the merged models on four diverse datasets, Image Net-R [15], AOKVQA [48], Screen2Word [58], Tab MWP [37].
Dataset Splits	Yes	For training, we follow the standard training procedure described in LLa VA, i.e., training each task individually and obtaining parameter-efficient modules.
Hardware Specification	Yes	All merging experiments are carried out on a single NVIDIA A6000 with the temperature set to 0.
Software Dependencies	No	The paper mentions LLa VA and CLIP as foundational models, and versions LLaVA 3 and CLIP 4, but does not specify software dependencies with version numbers like Python, PyTorch, or CUDA versions.
Experiment Setup	Yes	Unless otherwise stated, all models are trained with a rank of 16. More details can be found in Appendix C. [...] For training, we follow the standard training procedure described in LLa VA, i.e., training each task individually and obtaining parameter-efficient modules. Lo RA is added to linear layers in foundational blocks, and all models are trained for 1 epoch for merging. [...] Hyper-parameter λ is set to 2 by default.