Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

AltLoRA: Towards Better Gradient Approximation in Low-Rank Adaptation with Alternating Projections

Authors: Xin Yu, Yujia Wang, Jinghui Chen, Lingzhou Xue

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments across multiple tasks demonstrate that Alt Lo RA outperforms Lo RA and its variants, narrowing the gap toward full fine-tuning while preserving superior memory efficiency. [...] 5 Experimental Results This section empirically shows the effectiveness of our approach across various model architectures and datasets. Section 5.1 summarizes the experimental settings and results on supervised fine-tuning (SFT) benchmark tasks, and Section 5.2 provides details of the setup and results for natural language understanding tasks. Finally, ablation studies from multiple perspectives are presented in Section 5.3.
Researcher Affiliation Academia Xin Yu Department of Statistics The Pennsylvania State University State College, PA 16803 EMAIL Yujia Wang College of Information Sciences and Technology The Pennsylvania State University State College, PA 16803 EMAIL Jinghui Chen College of Information Sciences and Technology The Pennsylvania State University State College, PA 16803 EMAIL Lingzhou Xue Department of Statistics The Pennsylvania State University State College, PA 16803 EMAIL
Pseudocode Yes After analyzing how to efficiently optimize both the gradient and momentum under limited resource constraints, we summarize our proposed algorithm, Alt Lo RA, in Algorithm 1. Unlike the joint update strategy, Alt Lo RA updates only one of the low-rank matrices, either A or B, at each step, based on the scaled gradient and momentum presented in Theorems 1 and 2. [...] Algorithm 1: Alt Lo RA: Gradient Approximation via Alternating Projection with Proper Momentum Design under Lo RA s Memory Constraint [...] Algorithm 2: Alt Lo RA+: Alt Lo RA with Second Order Momentum
Open Source Code Yes The code for our project is available at https://github.com/LucasXinYu/AltLoRA.
Open Datasets Yes We assess our methods on dialogue generation with the Wizard LM dataset [72], mathematical reasoning with the Meta Math QA dataset [80], and code generation with the Code Feed Back dataset [90] using the LLama-3.1-8B and Llama-3-8B models [17] (see Appedix E.1). [...] For the dialogue generation task, we use the MT-Bench dataset [89] with GPT-4o... For the math task, we evaluate the model on the GSM8K test set [11]... For the code generation task, we evaluate on the Human Eval dataset [6]... We fine-tune the T5-based model [52] with our methods and the baselines on a subset of GLUE datasets [63]: MNLI, SST2, Co LA, QNLI, and MRPC.
Dataset Splits Yes Dialogue Generation Task We fine-tune large language models on a 52k subset of the Wizard LM dataset [72] and evaluate it using the MT-Bench dataset [89]. [...] Math Task We fine-tuning large language models on a 100k sample from the Meta Math QA dataset [80]. The model is then evaluated on the GSM8K test set [11]... Coding Task We fine-tuning large language models on a 100k subset of the Code Feed Back dataset [90] and test it on the Human Eval dataset [6]...
Hardware Specification Yes All experiments are conducted on NVIDIA A100 and NVIDIA A6000 GPUs.
Software Dependencies No Alt Lo RA, as a novel PEFT method, can be seamlessly integrated into popular libraries such as Hugging Face Transformers [69]. The key engineering modifications are as follows: [...]
Experiment Setup Yes Unless otherwise stated, we fine-tune models using default hyperparameters (if used): β1 = 0.9, β2 = 0.999, and zero weight decay. We adopt a cosine learning rate schedule with a warm-up ratio of 0.03. Lo RA adapters are applied to {Q, K, V, O} layers. By default, we set the rank to r = 8 and the scaling factor to α = 32 for dialogue generation tasks, and r = 8, α = 16 for the mathematical reasoning and code generation tasks. We carefully grid search the learning rates. [...] We set the sequence length to 1024 and the macro batch size to 4 for math and code tasks, and macro batch size to 8 for dialogue generation.