Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
SparseLoRA: Accelerating LLM Fine-Tuning with Contextual Sparsity
Authors: Samir Khaki, Xiuyu Li, Junxian Guo, Ligeng Zhu, Konstantinos N. Plataniotis, Amir Yazdanbakhsh, Kurt Keutzer, Song Han, Zhijian Liu
ICML 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experimental results show that Sparse Lo RA reduces computational cost by up to 2.2 and a measured speedup of up to 1.6 while maintaining accuracy across various downstream tasks, including commonsense and arithmetic reasoning, code generation, and instruction following. Evaluated across a diverse set of benchmarks, Sparse Lo RA achieves a computational cost reduction of up to 2.2 and a wall-clock speedup of up to 1.6 while maintaining accuracy on various downstream tasks, including commonsense and arithmetic reasoning, code-generation, and complex instruction following. |
| Researcher Affiliation | Collaboration | 1University of Toronto 2UC Berkeley 3MIT 4Google Deep Mind. |
| Pseudocode | Yes | Algorithm 1 SVD Sparsity Estimator |
| Open Source Code | No | The paper provides a URL (https://z-lab.ai/projects/sparselora) which appears to be a project page. However, it does not contain an explicit statement like "We release our code at..." or a direct link to a code repository for the methodology described in the paper. |
| Open Datasets | Yes | Benchmarks. We conduct experiments on five downstream tasks. The first set focuses on commonsense reasoning (referred to as CSR170K) and includes eight datasets: Bool Q (Clark et al., 2019), PIQA (Bisk et al., 2020), SIQA (Sap et al., 2019), Hella Swag (Zellers et al., 2019), Wino Grande (Sakaguchi et al., 2021), ARC-Easy and ARC-Challenge (Clark et al., 2018), and Openbook QA (Mihaylov et al., 2018). The second set focuses on arithmetic reasoning (referred to as Math10K) and includes three benchmarks: GSM8K (Cobbe et al., 2021), MAWPS (Koncel-Kedziorski et al., 2016), and SVAMP (Patel et al., 2021)*. |
| Dataset Splits | Yes | Following the practices established by Hu et al. (2023) and Liu et al. (2024c), we fine-tune our models on the combined training sets of all sub-tasks within each respective benchmark. We run each experiment five times, discard the highest and lowest performing runs, and report the average accuracy of the remaining three. |
| Hardware Specification | Yes | Efficiency metrics are derived from an NVIDIA A6000 GPU. |
| Software Dependencies | No | The paper mentions using Lo RA with specific parameters (dropout = 0, rank = 32, and α = 64) but does not specify software dependencies like programming language versions (e.g., Python), library versions (e.g., PyTorch, TensorFlow), or CUDA versions. |
| Experiment Setup | Yes | Table 12: Training Hyperparameters Across Datasets. All experiments use Lo RA with dropout = 0, rank = 32, and α = 64. Dataset Seq. Len Batch Size Epochs LR Scheduler Warmup Ratio CSR170K 512 8 1 3e-4 cosine 0.04 Math10K 512 8 3 3e-4 cosine 0.04 GLUE (COLA, STS-B, RTE, SST2, QNLI, MNLI, QQP) 128 8 3 5e-5 cosine 0.04 GLUE (MRPC, WNLI) 128 8 5 5e-5 cosine 0.04 Code Feedback 1024 6 1 2e-5 cosine 0.04 Wizard LM 2048 2 1 2e-5 cosine 0.04 |