Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

PermLLM: Learnable Channel Permutation for N:M Sparse Large Language Models

Authors: Lancheng Zou, Shuo Yin, Zehua Pei, Tsung-Yi Ho, Farzan Farnia, Bei Yu

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on the LLa MA series, Qwen, and OPT models demonstrate that Perm LLM achieves superior performance in optimizing N:M sparse models. The code is available at https://github.com/lanchengzou/Perm LLM.
Researcher Affiliation Academia Lancheng Zou1, Shuo Yin1, Zehua Pei1, Tsung-Yi Ho1, Farzan Farnia1, and Bei Yu1 1The Chinese University of Hong Kong
Pseudocode No The paper describes methods using mathematical equations and prose, but no explicit pseudocode or algorithm blocks are provided.
Open Source Code Yes The code is available at https://github.com/lanchengzou/Perm LLM.
Open Datasets Yes We randomly select 128 samples from the C4 dataset [47], each comprising 1024 tokens, to serve as the calibration data for all evaluated models. We utilize five zero-shot evaluation tasks: Hella Swag [60], ARC-(Easy and Challenge) [9], Open Book QA [41] and RTE [53] from lm-evaluation-harness [18] and one language modeling dataset: Wikitext2 [40] to evaluate the performance of the sparse models.
Dataset Splits Yes We randomly select 128 samples from the C4 dataset [47], each comprising 1024 tokens, to serve as the calibration data for all evaluated models. We utilize five zero-shot evaluation tasks: Hella Swag [60], ARC-(Easy and Challenge) [9], Open Book QA [41] and RTE [53] from lm-evaluation-harness [18] and one language modeling dataset: Wikitext2 [40] to evaluate the performance of the sparse models.
Hardware Specification Yes The experiments of Perm LLM are conducted on A100 GPUs.
Software Dependencies No We implement Perm LLM with Pytorch [43] and Hugging Face Transformers library [56]. (No specific version numbers are provided for these libraries).
Experiment Setup Yes For the proposed Perm LLM framework, we utilize Adam W [33] as the optimizer, with the learning rate set from {1e-3, 5e-3} for all models. The iteration of Sinkhorn normalization is 5. The temperature τ is linearly decayed from 1 to 0.1 to control the hardness of the soft permutation matrix in Equation (5). The block size for block-wise learnable channel permutation is set to 64, as it offers a balanced trade-off between performance and efficiency. Specifically, we use 1e-3 for Perm LLMW anda and 5e-3 for Perm LLMRIA. We use 50 iterations for permutation learning.