Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

LoRASuite: Efficient LoRA Adaptation Across Large Language Model Upgrades

Authors: Yanan Li, Fanxu Meng, Muhan Zhang, Shiai Zhu, Shangguang Wang, Mengwei Xu

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental evaluations demonstrate that Lo RASuite consistently surpasses small-scale vanilla Lo RA methods. Notably, on backbone LLMs such as Mini CPM and Qwen, Lo RASuite even exceeds the performance of full-scale Lo RA retraining, with average improvements of +1.4 and +6.6 points on math tasks, respectively. Additionally, Lo RASuite significantly reduces memory consumption by 5.5 GB and computational time by 78.23%.
Researcher Affiliation	Academia	Yanan Li , Fanxu Meng Muhan Zhang Shiai Zhu Shangguang Wang Mengwei Xu CSCN Beijing University of Posts and Telecommunications Peking University Unaffiliated EMAIL EMAIL, EMAIL, EMAIL
Pseudocode	Yes	Algorithm 1 CKA-based Layer Mapping Algorithm 2 Pseudocode of Lo RASuite in Py Torch-like style.
Open Source Code	Yes	Our implementation1 builds on the publicly available Huggingface transformers and peft libraries, with modifications to support initialization from transformed weights. 1https://github.com/Yanan Li18/Lo RASuite
Open Datasets	Yes	For commonsense reasoning tasks, we evaluate Bool Q [45], PIQA [46], SIQA [47], Hella Swag [48],Wino Grande [49], ARC-e, ARC-c [50], and OBQA [51]. For math tasks, we evaluate AQu A [52], GSM8K [53], MAWPS [54], and SVAMP [55]. To ensure reproducibility, we use the same training and evaluation datasets as LLM-Adapters [56].
Dataset Splits	Yes	Both the original and upgraded models Lo RA (10k) are trained on the full dataset with 10k samples, whereas Lo RA (100) and Lo RASuite with LFT (100) are trained on identical, randomly selected subsets of the Lo RA (10k) dataset with 100 samples.
Hardware Specification	Yes	All experiments were conducted on a Linux server equipped with 80 Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz cores, 500 GB RAM, and 8 NVIDIA V100 GPUs.
Software Dependencies	No	Our implementation1 builds on the publicly available Huggingface transformers and peft libraries, with modifications to support initialization from transformed weights.
Experiment Setup	Yes	Unless otherwise specified, we use the default Lo RA settings: rank 32, alpha 32, and dropout 0. A sensitivity analysis of these parameters is provided in Section 4.2. By default, the target modules include q_proj, k_proj, v_proj, and o_proj, with performance for other modules detailed in the Appendix A.3. Tables 8 and 9 detail the hyperparameters of Lo RA and Lo RASuite for commonsense and math tasks, respectively. Hyperparameters Lo RA Lo RASuite Rank 32 a 32 Dropout 0 Optimizer Adam W LR 3e-4 1e-3 LR Scheduler Linear Batch Size 16 Warmup Ratio 0.1 0 Epochs 3 Target Module Q,K,V,O