Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

PermLLM: Learnable Channel Permutation for N:M Sparse Large Language Models

Authors: Lancheng Zou, Shuo Yin, Zehua Pei, Tsung-Yi Ho, Farzan Farnia, Bei Yu

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on the LLa MA series, Qwen, and OPT models demonstrate that Perm LLM achieves superior performance in optimizing N:M sparse models. The code is available at https://github.com/lanchengzou/Perm LLM.
Researcher Affiliation	Academia	Lancheng Zou1, Shuo Yin1, Zehua Pei1, Tsung-Yi Ho1, Farzan Farnia1, and Bei Yu1 1The Chinese University of Hong Kong
Pseudocode	No	The paper describes methods using mathematical equations and prose, but no explicit pseudocode or algorithm blocks are provided.
Open Source Code	Yes	The code is available at https://github.com/lanchengzou/Perm LLM.
Open Datasets	Yes	We randomly select 128 samples from the C4 dataset [47], each comprising 1024 tokens, to serve as the calibration data for all evaluated models. We utilize five zero-shot evaluation tasks: Hella Swag [60], ARC-(Easy and Challenge) [9], Open Book QA [41] and RTE [53] from lm-evaluation-harness [18] and one language modeling dataset: Wikitext2 [40] to evaluate the performance of the sparse models.
Dataset Splits	Yes	We randomly select 128 samples from the C4 dataset [47], each comprising 1024 tokens, to serve as the calibration data for all evaluated models. We utilize five zero-shot evaluation tasks: Hella Swag [60], ARC-(Easy and Challenge) [9], Open Book QA [41] and RTE [53] from lm-evaluation-harness [18] and one language modeling dataset: Wikitext2 [40] to evaluate the performance of the sparse models.
Hardware Specification	Yes	The experiments of Perm LLM are conducted on A100 GPUs.
Software Dependencies	No	We implement Perm LLM with Pytorch [43] and Hugging Face Transformers library [56]. (No specific version numbers are provided for these libraries).
Experiment Setup	Yes	For the proposed Perm LLM framework, we utilize Adam W [33] as the optimizer, with the learning rate set from {1e-3, 5e-3} for all models. The iteration of Sinkhorn normalization is 5. The temperature τ is linearly decayed from 1 to 0.1 to control the hardness of the soft permutation matrix in Equation (5). The block size for block-wise learnable channel permutation is set to 64, as it offers a balanced trade-off between performance and efficiency. Specifically, we use 1e-3 for Perm LLMW anda and 5e-3 for Perm LLMRIA. We use 50 iterations for permutation learning.