Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Lua-LLM: Learning Unstructured-Sparsity Allocation for Large Language Models
Authors: Mingge Lu, Jingwei Sun, Junqing Lin, Zechun Zhou, Guangzhong Sun
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results on LLa MA and OPT families demonstrate significant performance improvements over existing methods. ... We evaluate Lua-LLM on several LLMs, including LLa MA-7B/13B, LLa MA2-7B/13B, LLa MA3-8B, and OPT-6.7B/13B. ... In Table 1, we report the language modeling perplexity of pruned LLa MA and OPT models from 50% to 80% sparsity levels. ... In addition to model perplexity results, we report the zero-shot accuracy of LLa MA models at 70% sparsity level in Table 2. |
| Researcher Affiliation | Academia | University of Science and Technology of China EMAIL, EMAIL, EMAIL, EMAIL, EMAIL |
| Pseudocode | Yes | Algorithm 1 The multi-stage Lua-LLM pruning algorithm. Input: training dataset X, pre-trained LLM model, and target sparsity p. Output: unstructured sparse model. 1: Initialization: integrate mask modules into Attn and MLP layers, prepare uniform importance scores within [0, 1], initialize all threshold parameters to target sparsity p. 2: for t 1 to T do: 3: for each weight matrix Wl do: 4: Generate soft pruning mask Ml with row-wise thresholds {tj}Cout j=1 by Eqn.(7), 5: end for 6: Forward propagation: Ltotal = Ltask(X; M W) + Ξ»reg Lreg(M; p), 7: Update row-wise threshold parameters during back-propagation, 8: end for 9: Save the row-wise threshold parameters for each weight matrix, 10: Pruning: computing hard masks for pruning by Eqn.(6). |
| Open Source Code | Yes | Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [Yes] Justification: See supplemental material. We provide our code with sufficient instructions to enable reproduction for the experiment results. |
| Open Datasets | Yes | To train the learnable pruning masks, we use 2048-token segments from C4 [59] dataset, which is also used to sample calibration data in previous works. We evaluate the language modeling perplexity on the validation set of raw-Wiki Text2 [53] dataset. ... we also evaluate the zero-shot accuracy of pruned models on seven downstream tasks, including Bool Q [11], PIQA [7], Hella Swag [77], Wino Grande [62], ARC-easy [12], ARC-challenge [12], and Openbook QA [54], based on the Eleuther AI LM-Evaluation-Harness [35] framework. |
| Dataset Splits | Yes | We evaluate the language modeling perplexity on the validation set of raw-Wiki Text2 [53] dataset. To ensure a fair comparison, the sequence length for all models is set to 2048. ... We use the testcase with batch size set to 8, input sequence length set to 32, and output sequence length set to 256. |
| Hardware Specification | Yes | For LLa MA-7B model, Lua-LLM learns sparsity allocation in only 1 hour on 2 NVIDIA A100 GPUs... The learnable threshold parameters are trained for 500 iterations, conducted on NVIDIA A100 80 GB GPUs. ... We evaluate the pruned OPT-6.7B and OPT-13B models on an NVIDIA A100 80GB GPU |
| Software Dependencies | No | We implement Lua-LLM in Py Torch [57] and use Hugging Face transformers library [70] for the evaluated LLMs. |
| Experiment Setup | Yes | We utilize the Adam W [46] optimizer with the learning rate set to 5 10 3 and weight decay set to 0.05. The learnable threshold parameters are trained for 500 iterations... To ensure a fair comparison, the sequence length for all models is set to 2048. ... We use the testcase with batch size set to 8, input sequence length set to 32, and output sequence length set to 256. |