Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

The Primacy of Magnitude in Low-Rank Adaptation

Authors: Zicheng Zhang, Haoran Li, Yifeng Zhang, Guoqiang Gong, Jiaxing Wang, Junxing Hu, Pengzhang Liu, Qixia Jiang

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments show that Lo RAM serves as a strong baseline, retaining the full efficiency of Lo RA while matching or outperforming spectral initialization across benchmarks.
Researcher Affiliation	Collaboration	Zicheng Zhang1 Haoran Li2 Yifeng Zhang1 Guoqiang Gong1 Jiaxing Wang1 Junxing Hu1 Pengzhang Liu1 Qixia Jiang1 1JD.com 2University of Chinese Academy of Sciences
Pseudocode	Yes	Algorithm 1 Lo RAM Initialization Procedure Input: Pretrained weight W Rn m, target rank r Output: Initialized parameters A(0), B(0), W
Open Source Code	No	1Code is available here. In NeurIPS Paper Checklist, Question 5: "The code will be made available upon acceptance of the article."
Open Datasets	Yes	For math tasks, the model is tuned on Meta Math QA [53] and evaluated on GSM8K [54] and MATH [53] validation sets. For coding tasks, we use Code Feedback [55] as training dataset, with evaluations on Human Eval [56] and MBPP [57]. For commonsense tasks, model is tuned on Commonsense170K [58], and we report averaged accuracy on eight sub-datasets. We evaluate the NLU performance by fine-tuning the De BERTa-v3-base model [50] with a rank of 8 on eight tasks in the GLUE benchmark [59].
Dataset Splits	Yes	For math tasks, the model is tuned on Meta Math QA [53] and evaluated on GSM8K [54] and MATH [53] validation sets. For coding tasks, we use Code Feedback [55] as training dataset, with evaluations on Human Eval [56] and MBPP [57]... We evaluate the NLU performance by fine-tuning the De BERTa-v3-base model [50] with a rank of 8 on eight tasks in the GLUE benchmark [59]. We utilize scripts from the Transformers Library [47] to ensure a fair comparison.
Hardware Specification	Yes	All experiments are run on servers with 8 NVIDIA H800 GPUs.
Software Dependencies	No	We conduct comprehensive experiments to evaluate Lo RAM efficiently implemented via the PEFT library [6]. We utilize scripts from the Transformers Library [47] to ensure a fair comparison.
Experiment Setup	Yes	Our setup strictly follows Pi SSA [40], using the Adam W optimizer with a batch size of 128, a learning rate of 2 10 5, a warmup ratio of 0.03, and no weight decay. All experiments are performed on subsets containing 100K data points for one epoch to minimize training overhead. For NLU tasks: All methods are trained with a learning rate of 1 10 4 for 3 training epochs, except for MRPC, which uses 5 epochs due to its smaller size.