Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Dynamic Low-Rank Sparse Adaptation for Large Language Models

Authors: Weizhong Huang, Yuxin Zhang, Xiawu Zheng, Yang Liu, Jing Lin, Yiwu Yao, Rongrong Ji

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments tell that Lo SA can efficiently boost the efficacy of sparse LLMs within a few hours, without introducing any additional inferential burden. For example, Lo SA reduced the perplexity of sparse LLa MA-2-7B by 68.73 and increased zero-shot accuracy by 16.32% , achieving a 2.60 speedup on CPU and 2.23 speedup on GPU, requiring only 45 minutes of fine-tuning on a single NVIDIA A100 80GB GPU.
Researcher Affiliation	Collaboration	1Key Laboratory of Multimedia Trusted Perception and Efficient Computing, Ministry of Education of China, Xiamen University, 361005, P.R. China. 2 Huawei Technologies. 3Institute of Artificial Intelligence, Xiamen University. 4 Peng Cheng Laboratory, Shenzhen, China.
Pseudocode	Yes	Algorithm 1: Dynamic Low-rank Sparse Adaptation (Lo SA)
Open Source Code	Yes	Code is available at https://github.com/wzhuang-xmu/Lo SA.
Open Datasets	Yes	We report perplexity of sparse LLM on Wiki Text-2 (Merity et al., 2016) dataset and use lm-eval-harness (Gao et al., 2021) to evaluate the zero-shot accuracy on downstream datasets, including Hella Swag (Zellers et al., 2019), Winogrande (Sakaguchi et al., 2021), Bool Q (Clark et al., 2019), Open Book QA (Mihaylov et al., 2018), PIQA (Bisk et al., 2020), ARC-Easy, and ARC-Challenge (Clark et al., 2018).
Dataset Splits	No	The paper mentions using a "10K subset from the Alpaca-GPT4 (Peng et al., 2023) to construct our fine-tuning dataset" and "128 sequences sampled from the C4 training set (Raffel et al., 2020) for sparsification". While standard benchmark datasets are used for evaluation, no explicit train/test/validation splits (percentages, counts, or references to specific split configurations) are provided for any of the datasets used for fine-tuning or evaluation.
Hardware Specification	Yes	For example, Lo SA reduced the perplexity of sparse LLa MA-2-7B by 68.73 and increased zero-shot accuracy by 16.32% , achieving a 2.60 speedup on CPU and 2.23 speedup on GPU, requiring only 45 minutes of fine-tuning on a single NVIDIA A100 80GB GPU. ... All experiments were conducted on NVIDIA A100 80GB GPUs. ... We measured the end-to-end time of the model generate tokens using the Deep Sparse (Neural Magic, 2021) inference engine on an Intel(R) Xeon(R) Silver 4314 CPU and the nm-vllm (Neural Magic, 2024) inference engine on a NVIDIA RTX 4090 24GB GPU.
Software Dependencies	No	The paper mentions using 'Paged Adam W optimizer', 'Deep Sparse inference engine', 'nm-vllm inference engine', and 'lm-eval-harness' but does not specify version numbers for these software components or any other libraries/frameworks.
Experiment Setup	Yes	During the fine-tuning process, we employed the Paged Adam W optimizer (Dettmers et al., 2024), setting a maximum gradient norm of 0.3. The learning rate followed a linear learning rate schedule and set the learning rate to be 2 10 4. ... We set the fine-tuning steps T = 5 and initial average rank Ω1 = 6.