Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Minimum Variance Unbiased N:M Sparsity for the Neural Gradients
Authors: Brian Chmiel, Itay Hubara, Ron Banner, Daniel Soudry
ICLR 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we demonstrate the effectiveness of our proposed method over several vision and language models. First we show the effect of the proposed method for the fine-grained N:M structured sparsity on the neural gradients. Then we combine this method with the fine-grained N:M transposable-weights method (Hubara et al., 2021), allowing the acceleration with N:M structured sparsity in all training GEMM. Moreover, we show the combination of N:M structured sparsity in all training GEMM with 8-bit quantization achieving non or small accuracy degradation. Experimental details appear in Appendix A.4. |
| Researcher Affiliation | Collaboration | Habana Labs An Intel company, Caesarea, Israel, Department of Electrical Engineering Technion, Haifa, Israel |
| Pseudocode | No | The paper mentions providing a method in Appendix A.2 ("We provide such a method in Appendix A.2.") and a reference implementation in supplementary material, but the pseudocode or algorithm blocks themselves are not directly present in the provided text. |
| Open Source Code | Yes | A reference implementation is supplied in the supplementary material. |
| Open Datasets | Yes | Table 4: Effect of applying the proposed MVUE 1:2 and approx-MVUE 2:4 on the neural gradients for different models and datasets. ... Res Net18 Image Net ... Res Net50 Image Net ... Vi T-B16 Cifar10 ... Bert finetune Squad ... Bert pretrain Wiki ... Transformer WMT En-De |
| Dataset Splits | Yes | The paper uses well-known benchmark datasets such as ImageNet, CIFAR-10, SQuAD, Wiki, and WMT En-De, which typically have standard train/validation/test splits. Additionally, it states "Experimental details appear in Appendix A.4.", which would typically specify any non-standard splits or confirm standard ones. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory amounts) used for running its experiments. It mentions general hardware discussions in references (Nvidia A100, H100, Graphcore IPU, Habana Gaudi) but not as the experimental setup. |
| Software Dependencies | No | The paper does not provide specific ancillary software details with version numbers (e.g., library names like PyTorch, TensorFlow, along with their versions) needed to replicate the experiment. |
| Experiment Setup | Yes | Experimental details appear in Appendix A.4. |