Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

LoRA vs Full Fine-tuning: An Illusion of Equivalence

Authors: Reece Shuttleworth, Jacob Andreas, Antonio Torralba, Pratyusha Sharma

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We study how Lo RA and full-finetuning change pre-trained models by analyzing the model s weight matrices through the lens of their spectral properties. We find that Lo RA and full fine-tuning yield weight matrices whose singular value decompositions exhibit very different structure: weight matrices trained with Lo RA have new, high-ranking singular vectors, which we call intruder dimensions, while those trained with full fine-tuning do not. Further, we extend the finding that Lo RA forgets less than full fine-tuning and find its forgetting is vastly localized to the intruder dimension by causally intervening on the intruder dimensions by changing their associated singular values post-fine-tuning, we show that they cause forgetting.
Researcher Affiliation Academia Reece Shuttleworth Jacob Andreas Antonio Torralba Pratyusha Sharma MIT CSAIL EMAIL
Pseudocode Yes Algorithm 1 Finding intruder dimensions. Require: Pre-trained weights W0, fine-tuned weights Wt, cosine similarity threshold ϵ, # of fine-tuned singular vectors to examine k. 1: [U0, Σ0, V 0 ] SVD(W0) 2: [Ut, Σt, V t ] SVD(Wtuned) 3: n_intruders 0 4: n # of pre-trained singular vectors 5: for j 1 to k do 6: if i n : cos U0[i], Ut[j] < ϵ then 7: n_intruders n_intruders + 1 8: end if 9: end for 10: return n_intruders
Open Source Code Yes We release code to replicate our main results (code is not attached to avoid breaking double-blind review, link will be attached upon acceptance). This github repository contains exact commands needed to replicate our findings.
Open Datasets Yes For LLa MA2-7B, we follow Biderman et al. [2024] and measure forgetting as the average score on Hellaswag [Zellers et al., 2019], Wino Grande [Sakaguchi et al., 2021], Arc-Challenge [Clark et al., 2018]. For Ro BERTa-base, we measure its pseudo-loss , which is analogous to language modelling loss for encoder-only models, as described by Salazar et al. [2020] on a sample of its pre-training dataset (as described by Liu et al. [2019]).
Dataset Splits Yes For LLa MA2-7B, we follow Biderman et al. [2024] and measure forgetting as the average score on Hellaswag [Zellers et al., 2019], Wino Grande [Sakaguchi et al., 2021], Arc-Challenge [Clark et al., 2018]... After training on a specific task, we test on all tasks by, for each task, separately retraining its classification head before testing on its test set.
Hardware Specification Yes All experiments were run on an internal, shared 8x A100-SXM4-80GB machine. All Ro BERTa-base fine-tuning runs required a single A100 GPU. All evaluations and analyses also required a single A100 GPU.
Software Dependencies No We use the Adam optimizer [Kingma and Ba, 2017] with no weight decay and a maximum sequence length of 512. We also use the PEFT library, but no version numbers are provided.
Experiment Setup Yes For all models, we use a linear learning rate schedule with 0.06 linear warmup ratio and train for a maximum of 5 epochs with batch size 16. We use the Adam optimizer [Kingma and Ba, 2017] with no weight decay and a maximum sequence length of 512. We fine-tune all linear layers besides the embedding matrix. For full fine-tuning, we use a learning rate of 1e-5. For Lo RA, we set α = 2r, and train for all ranks in {1, 2, 4, 8, 16, 64}. We hold the total learning rate of Lo RA", which is α η, fixed as we sweep rank such that this product always equals 2.4e-3.