Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Lua-LLM: Learning Unstructured-Sparsity Allocation for Large Language Models

Authors: Mingge Lu, Jingwei Sun, Junqing Lin, Zechun Zhou, Guangzhong Sun

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results on LLa MA and OPT families demonstrate significant performance improvements over existing methods. ... We evaluate Lua-LLM on several LLMs, including LLa MA-7B/13B, LLa MA2-7B/13B, LLa MA3-8B, and OPT-6.7B/13B. ... In Table 1, we report the language modeling perplexity of pruned LLa MA and OPT models from 50% to 80% sparsity levels. ... In addition to model perplexity results, we report the zero-shot accuracy of LLa MA models at 70% sparsity level in Table 2.
Researcher Affiliation	Academia	University of Science and Technology of China EMAIL, EMAIL, EMAIL, EMAIL, EMAIL
Pseudocode	Yes	Algorithm 1 The multi-stage Lua-LLM pruning algorithm. Input: training dataset X, pre-trained LLM model, and target sparsity p. Output: unstructured sparse model. 1: Initialization: integrate mask modules into Attn and MLP layers, prepare uniform importance scores within [0, 1], initialize all threshold parameters to target sparsity p. 2: for t 1 to T do: 3: for each weight matrix Wl do: 4: Generate soft pruning mask Ml with row-wise thresholds {tj}Cout j=1 by Eqn.(7), 5: end for 6: Forward propagation: Ltotal = Ltask(X; M W) + λreg Lreg(M; p), 7: Update row-wise threshold parameters during back-propagation, 8: end for 9: Save the row-wise threshold parameters for each weight matrix, 10: Pruning: computing hard masks for pruning by Eqn.(6).
Open Source Code	Yes	Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [Yes] Justification: See supplemental material. We provide our code with sufficient instructions to enable reproduction for the experiment results.
Open Datasets	Yes	To train the learnable pruning masks, we use 2048-token segments from C4 [59] dataset, which is also used to sample calibration data in previous works. We evaluate the language modeling perplexity on the validation set of raw-Wiki Text2 [53] dataset. ... we also evaluate the zero-shot accuracy of pruned models on seven downstream tasks, including Bool Q [11], PIQA [7], Hella Swag [77], Wino Grande [62], ARC-easy [12], ARC-challenge [12], and Openbook QA [54], based on the Eleuther AI LM-Evaluation-Harness [35] framework.
Dataset Splits	Yes	We evaluate the language modeling perplexity on the validation set of raw-Wiki Text2 [53] dataset. To ensure a fair comparison, the sequence length for all models is set to 2048. ... We use the testcase with batch size set to 8, input sequence length set to 32, and output sequence length set to 256.
Hardware Specification	Yes	For LLa MA-7B model, Lua-LLM learns sparsity allocation in only 1 hour on 2 NVIDIA A100 GPUs... The learnable threshold parameters are trained for 500 iterations, conducted on NVIDIA A100 80 GB GPUs. ... We evaluate the pruned OPT-6.7B and OPT-13B models on an NVIDIA A100 80GB GPU
Software Dependencies	No	We implement Lua-LLM in Py Torch [57] and use Hugging Face transformers library [70] for the evaluated LLMs.
Experiment Setup	Yes	We utilize the Adam W [46] optimizer with the learning rate set to 5 10 3 and weight decay set to 0.05. The learnable threshold parameters are trained for 500 iterations... To ensure a fair comparison, the sequence length for all models is set to 2048. ... We use the testcase with batch size set to 8, input sequence length set to 32, and output sequence length set to 256.